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Abstract 



The deletion channel is the simplest point-to-point communication channel that models lack of syn- 
chronization. Input bits are deleted independently with probability d, and when they are not deleted, 
they are not affected by the channel. Despite significant effort, little is known about the capacity of this 
channel, and even less about optimal coding schemes. In this paper we develop a new systematic ap- 
proach to this problem, by demonstrating that capacity can be computed in a series expansion for small 
deletion probability. We compute three leading terms of this expansion, and find an input distribution 
that achieves capacity up to this order. This constitutes the first optimal coding result for the deletion 
channel. 

The key idea employed is the following: We understand perfectly the deletion channel with deletion 
probability o? = 0. It has capacity 1 and the optimal input distribution is i.i.d. Bernoulli(l/2). It is nat- 
. ural to expect that the channel with small deletion probabilities has a capacity that varies smoothly with 

' d, and that the optimal input distribution is obtained by smoothly perturbing the i.i.d. Bernoulli(l/2) 

•/^ . process. Our results show that this is indeed the case. We think that this general strategy can be useful 

' in a number of capacity calculations. 

^' 

o 

1 Introduction 

The (binary) deletion channel accepts bits as inputs, and deletes each transmitted bit independently with 
^ ' probability d. Computing or providing systematic approximations to its capacity is one of the outstand- 
?-H ■ ing problems in information theory p^. An important motivation comes from the need to understand 
- - -' synchronization errors and optimal ways to cope with them. 

In this paper we suggest a new approach. We demonstrate that capacity can be computed in a series 
expansion for small deletion probability, by computing the first two orders of such an expansion. Our main 
result is the following. 

Theorem 1.1. Let C{d) be the capacity of the deletion channel with deletion probability d. Then, for small 
d and any e > 0, 

C{d) = l + d\ogd- Aid + A2d^ + 0{d^-^), (1) 
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where 



Ai = log(2e) -^2-''-Hlogl w 1.15416377 
1 

/ oo oo \ 

— 2 + -ci + ^ 2"' (/ Inlf -C2Y, 2"'^^ lnl\ ^ 1.67814594 

^ V 1=1 1=1 / 



1=1 

^2 = C3 + C4 + 



C2 = ^2-^llnl « 1.78628364 



C3 = ^ ( -l + >>-'<j (;)log(^') -/2log/ + (/-l)(/-3)log(/-l) + G-2)log(/-2) 



-0.88636960 

00 



C4 = 5;2-(2+^) a-l)(j-3)/.(-^) 

+ E E 2-('+^+^) + j - 1) (i - 3) . « 0.69001321 



i=2 j=4 

i/ere h{-) is the binary entropy function, i.e., h{p) = —plogp— (1 — |?)log(l —p). 

Further, the binary stationary source defined by the property that the times at which it switches from 
to I or viceversa form a renewal process with holding time distribution Pl{1) = 2~'(1 + d{l In / — C2//2)), 
achieves rate within 0((i^^'^) of capacity. 

Given a binary sequence, we will call 'runs' its maximal blocks of contiguous O's or I's. We shall refer 
to binary sources such that the switch times form a renewal process as sources (or processes) with i.i.d. 
runs. 

The 'rate' of a given binary source is the maximum rate at which information can be transmitted through 
the deletion channel using input sequences distributed as the source. A formal definition is provided below 
(see Definition 12. 3p . Logarithms denoted by log here (and in the rest of the paper) are understood to be 
in base 2. While one might be skeptical about the concrete meaning of asymptotic expansions of the type 
([1]), they often prove surprisingly accurate. For instance at c? = 0.1 (10% of the input symbols are deleted), 
the expression in Eq. ([T]) (dropping the error term 0{d^^'')) is larger than the best lower bound [2j by 
about 0.007 bits. The lower bound of [2] is derived using a Markov source and 'jigsaw' decoding. Our 
asymptotic analysis implies that the loss in rate due to restricting to Markov sources and jigsaw decoding 
(cf. Theorem 16.11 and Remark 16. 2p . to leading order, is 0.904(i-^ ~ 0.009. Hence, we estimate that our 
asymptotic approach incurs an error of about 0.002 bits for computing the capacity at d = 0.1. 

More importantly asymptotic expansions can provide useful design insight. Theorem 11.11 shows that 
the stationary process consisting of i.i.d. runs with the specified run length distribution, achieves capacity 
to within 0{d^~''). In comparison, the best performing approach tried before this was to use a first order 
Markov source for coding [2]. We are able to show, in fact, that this approach incurs a loss that is 0((i^), 
which is the same order as the loss incurred by the trivial approach of using i.i.d. Bernoulli(l/2)! 

Remark 1.2. In this work, we prove rigorous upper and lower bounds on capacity that match up to 
quadratic order in d (cf. Theorem but without explicitly evaluating the constants in the error terms. 

It would be very interesting to obtain explicit expressions for these constants. 
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Before this work, there was no non-trivial optimal coding result known for the deletion channel. 
Further terms in the capacity expansion can be expected to supply even more detailed information about 
the optimal coding scheme and allow us to achieve capacity to higher orders. 

We think that the strategy adopted here might be useful in other information theory problems. The 
underlying philosophy is that whenever capacity is known for a specific value of the channel parameter, 
and the corresponding optimal input distribution is unique and well characterized, it should be possible 
to compute an asymptotic expansion around that value. In the present context the special channel is 
the perfect channel, i.e. the deletion channel with deletion probability d = 0. The corresponding input 
distribution is the i.i.d. Bernoulli(l/2) process. 

1.1 Related work 

Dobrushin |3j proved a coding theorem for the deletion channel, and other channels with synchronization 
errors. He showed that the maximum rate of reliable communication is given by the maximal mutual 
information per bit, and proved that this can be achieved through a random coding scheme. This charac- 
terization has so far found limited use in proving concrete estimates. An important exception is provided 
by the work of Kirsch and Drinea 0] who use Dobrushin coding theorem to prove lower bounds on the 
capacity of channels with deletions and duplications. We will also use Dobrushin theorem in a crucial way, 
although most of our effort will be devoted to proving upper bounds on the capacity. 

Several capacity bounds have been developed over the last few years, following alternative approaches, 
and are surveyed in [I]. In particular, it has been proved that C{d) = @{l—d) as d — 1 [5j. The papers [6l[7] 
improve the upper bound in this limit obtaining limsup^__^^ C{d)/{1 — d) < 0.413. However, determining 
the asymptotic behavior in this limit (i.e. finding a constant Bi such that C{d) = Bi{l — d) + o(l — d)) is 
an open problem. When applied to the small d regime, none of the known upper bounds actually captures 
the correct behavior as stated in Eq. ([T]). A simple calculation shows that the first upper bound in [8j has 
asymptotics of 1 + (3/4)dlog d. Another work [6] shows that C > 1 — i.ldd as d — t- oo. As we show in the 
present paper, this behavior can be controlled exactly, up to the third leading term of the expansion. 

A short version of this paper was presented at the 2010 International Symposium on Information Theory 
(ISIT) ^9]. At the same conference, Kalai, Mitzenmacher and Sudan |10j presented a result analogous to 
Theorem ll.il The proof is based on a counting argument, very different from the the techniques employed 
here. Also, the result of |10) is not the same as in Theorem 11.11 since only the dlogd term of the series is 
established in [10] . Theorem 11.11 improves on our ISIT result [9] , that contained only the first two terms 
in the series expansion, but not the order d"^ term. Also, we obtain a non-trivial coding scheme for the 
first time in this paper. The trivial i.i.d. Bernoulli(l/2) coding scheme is enough to achieve capacity up 
to linear order as shown in our conference paper [9]. 

1.2 Numerical illustration of results 

We can numerically evaluate the expression in Eq. ([T]) (dropping the error term) to obtain estimates of 
capacity for small deletion probabilities. 

Ceet = 1 + dlog d- Aid + A2d'^ . 

The values of C^st are presented in Table [1] and Figure [TJ We compare with the best known numerical 
lower bounds [2] and upper bounds [6l|8]. 



The trivial exception is the case d — 0, for which the i.i.d. BernouIh(l/2) process achieves capacity. 
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CL 


Best lower bound 


c 


Best upper bound 


0.05 


0.7283 


0.7304 


0.8160 


0.10 


0.5620 


0.5692 


0.6890 


0.15 


0.4392 


0.4541 


0.5790 


0.20 


0.3467 


0.3719 


0.4910 


0.25 


0.2759 


0.3163 


0.4200 


0.30 


0.2224 


0.2837 


0.3620 


0.35 


0.1810 


0.2715 


0.3150 


0.40 


0.1484 


0.2781 


0.2750 


0.45 


0.1229 


0.3020 


0.2410 


0.50 


0.1019 


0.3425 


0.2120 



Table 1: Table showing best known numerical bounds on capacity (from O [6l [8]) compared with our 
estimate based on the small d expansion. 



We stress here that C^^^ is neither an upper nor a lower hound on capacity. It is an estimate based 
on taking the leading terms of the asymptotic expansion of capacity for small d, and is expected to be 
accurate for small values of d. Indeed, we see that for d larger than 0.4, our estimate C^^^ exceeds the upper 
bound. This simply indicates that we should not use Ce^t as an estimate for such large d. We believe that 
Cest provides an excellent estimate of capacity for d < 0.2. 

1.3 Notation 

We borrow 0(-)i ^i') and 0(-) notation from the computer science literature. We define these as follows to fit our 
needs. Let / : [0, 1] ^- R and g : [0, 1] ^- R+. We say: 

• We say / — 0{g) if there is a constant c < oo such that |/(a;)| < cg{x) for all x e [0, 1]. 

• We say / — ^{g) if there is a constant c > such that f{x) > cg{x) for all x E [0, 1]. 

• We say / = Q{g) if there are constants c < oo , c' > such that cg{x) > f{x) > c'g{x) for all x e [0, 1]. 

Throughout this paper, we adhere to the convention that the constants c, c' above should not depend on the processes 
X, Y, . . . etc. under consideration, if there are such processes. 

1.4 Outline of the paper 

Section[2]contains the basic definitions and results necessary for our approach to estimating the capacity of the deletion 
channel. We show that it is sufficient to consider stationary ergodic input sources, and define their corresponding 
rate (mutual information per bit). Capacity is obtained by maximizing this quantity over stationary processes. In 
Section [3l we present an informal argument that contains the basic intuition leading to our main result (Theorem 
II. 1|) . and allows us to correctly guess the optimal input distribution. Section |4] states a small number of core lemmas, 
and shows that they imply Theorem 11.11 Finally, Section [5] states several technical results (proved in appendices) 
and uses them to prove the core lemmas. We conclude with a short discussion, including a list of open problems, in 
Section [6l 
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Figure 1: Plot showing best known numerical bounds on capacity (from [21 [6l [8]) compared with our 
estimate based on the small d expansion. 



2 Preliminaries 

For the reader's convenience, we restate here some known results that we will use extensively, along with some 
definitions and auxiliary lemmas. 

Consider a sequence of channels {W„}„>i, where Wn allows exactly n inputs bits, and deletes each bit indepen- 
dently with probability d. The output of Wn for input X" is a binary vector denoted by y(X"). The length of 
is a binomial random variable. We want to find maximum rate at which we can send information over this 
sequence of channels with vanishingly small error probability. 

The following characterization follows from [3] . 

Theorem 2.1. Let 

Cn= -niax/(X";r(X")). 

n Px" 

C = lim Cn = inf C„ , (2) 

n— foo n>l 

and is equal to the capacity of the deletion channel. 

A further useful remark is that, in computing capacity, we can assume {Xi, . . . , X„) to be n consecutive coordi- 
nates of a stationary ergodic process. We denote by S the class of stationary and ergodic processes that take binary 
values. 

Lemma 2.2. Let X {Xi}ii^z be a stationary and ergodic process, with Xi taking values in {0, 1}. Then the limit 
/(X) = lim„^oo exists and 

C = max/(X). 

xe5 



Then, the following limit exists 
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We use the following natural definition of the rate achieved by a stationary ergodic process. 
Definition 2.3. For stationary and ergodic X, we call /(X) = lini„_i.oo [X"^ ;Y [X"^)) the rate achieved by X. 
Proofs of Theorem 12.11 and Lemma 12.21 are provided in Appendix [A] 

Given a stationary process X, it is convenient to consider it from the point of view of a 'uniformly random' 
block/run. Intuitively, this corresponds to choosing a large integer n and selecting as reference point the beginning 
of a uniformly random block in Xi, . . . , X„. Notice that this approach naturally discounts longer blocks for finite n. 
While such a procedure can be made rigorous by taking the limit n — )■ oo, it is more convenient to make use of the 
notion of Palm measure from the theory of point processes [HI [12], which is, in this case, particularly easy to define. 
To a binary source X, we can associate in a bijective way a subset of times § C Z, by letting t G S if and only if Xt 
is the first bit of a run. The Palm measure Pi is then the distribution of X conditional on the event 1 G S. 

We denote by L the length of the block starting at 1 under the Palm measure, and denote hy pl its distribution. 
As an example, if X is the i.i.d. Bernoulli(l/2) process, we have pl — P*l where p2(0 = 2^'. We will also call pl the 
block-perspective run length distribution or simply the run length distribution, and let 

oo 

MX)=E^Pi(/)Z, 

be its average. Let Lq be the length of the block containing bit JTo in the stationary process X. A standard 
calculation [TTl [T^ yields P(-Lo = — ^Pi(0/M(X) ■ Since Lq is a well defined and almost surely finite (by ergodicity), 
we necessarily have /.t(X) < oo. 

In our main result. Theorem 1 1.1[ a special role is played by processes X such that the associated switch times 
form a stationary renewal process. We will refer to such an X as a process with i.i.d. runs. 

3 Intuition behind the main theorem 

In this section, we provide a heuristic/non-rigorous explanation for our main result. The aim is build intuition and 
motivate our approach, without getting bogged down with the numerous technical difficulties that arise. In fact, 
we focus here on heuristically deriving the optimal input process X''^ , and do not actually obtain the quadratic term 
of the capacity expansion. We find X^ by computing various quantities to leading order and using the following 
observation (cf. Remark l4.2p . 

Key Observation: The process that achieves capacity for small d should be 'close' to the Bernoulli{l/2) process, 
since i?(X) must be close to 1. 

We have 

= (3) 

Let be a binary vector containing a one at position i if and only if Xi is deleted from the input vector. We can 
write 

H{Y\X") = H{Y, - H{D"\X", Y) . 

But F is a function of leading to H{Y,D''\X'') = H{D'^\X'^) = HiD"^) = nh{d), where we used the fact 

that D" is i.i.d. Bernoulli((i), independent of X". It follows that 

H{Y\X")^nh{d)-H{D"\X",Y). (4) 

The term H{D'^\X'^ ,Y) represents ambiguity in the location of deletions, given the input and output strings. 
Now, since d is small, we expect that most deletions occur in 'isolation', i.e., far away from other deletions. Make the 
(incorrect) assumption that all deletions occur such that no three consecutive runs have more than one deletion in 
total. In this case, we can unambiguously associate runs in Y with runs in X. Ambiguity in the location of a deletion 
occurs if and only if a deletion occurs in a run of length I > 1. In this case, each of I locations is equally likely for 
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the deletion, leading to a contribution of log ^ to H{D"'\X",Y). Now, a run of length I should suffer a deletion with 
probability « Id. Thus, we expect 

1 fi °° 

-i/(D"|X",F)«-^^Pi(0/logL 



/i(X) 



:=i 



We know that H{X) is close to 1, implying /x(X) is close to 2 and is close to p2(0 = 2 This leads to 

d(/i(X)-2) 



ii?(i^"|X",r)«-^Pi(Onog/ 



1=1 



2 In 2 



(5) 



Consider H{Y). Now, if the input X" is drawn from a stationary process X, we expect the output to 
also be a segment of some stationary process Y. (It turns out that this is the case.) Moreover, we expect that the 
channel output has n(l — d) + o{n) bits, leading to H{Y) w n(l — d)H{Y). Denote the run length distribution in Y 
by qL{-)- Define fJ.{Y) = J2h^i Let iy denote the length of a random run drawn according to (Zl(-)- It is not 

hard to see that 

H{Y) < H{LY)/fi{Y) , 

with equality iff Y consists of i.i.d. runs, which occurs iff X consists of i.i.d. runs. Define q}^{l) = 2~'. An explicit 
calculation yields H{Ly) = 1 — D{qL\\qj^) / fi{Y). We know that H{Y) is close to 1, implying fi{Y) is close to 2 and 
D{qL\\ql) is small. Thus, 

lim - H{Y) = (1 - d)H{Y) < (1 - d){l - D{qL\\ql)hi{Y)) « 1 - d - D{qL\\ql)/2 . 
n— ^oo Ti 

Notice that an i.i.d. Bernoulli(l/2) input results in an i.i.d. Bernoulli(l/2) output from the deletion channel. 
The following is made precise in Lemma [5.91 Let A be the 'distance' between and p*^. Then a short calculation 
tells us that the distance between p^ and qL should be 0((i^^^A). In other words p^ and qL are very nearly equal 
to each other. 

So we obtain, to leading order. 



lim -H{Y) <l-d-D[pL\\pl)/2, 

n— J-oo n 



(6) 



with (approximate) equality iff X consists of i.i.d. runs. 
Putting Eqs. ©, Q, (P and © together, we have 



/(X) = lim -/(X";r) 



<l-d-D{pL\\pl)/2-h{d) + 



_C2_ 

In 2 



Y.PL{l)l{\ogl 



1=1 



1 - dlog(l/rf) - Aid - -D{pl\\pI) + - 



OO 

Y.PL{l)l(\0gl 



C2 

^ 21n2 
21n2 



Since this (approximate) upper bound on /(X) depends on input X only through p^, we choose X consisting of 
i.i.d. runs so that (approximate) equality holds. 
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We expect pL to be close to p\{l)- A Taylor expansion gives 

oo 

1=1 

1=1 ^ ^ 

Thus, we want to maximize 

oo -, ~ oo 



2 In 2 

1=1 



C2 

2 In 2 

.1=1 



subject to X]i^i-Pi(0 ~ Ij order to achieve the largest possible /(X). A simple calculation tells us that the 
maximizing distribution is p\{l) = 2-'{l + d{llnl - C2I/2)). 

4 Proof of the main theorem: OutUne 

In this section we provide the proof of Theorem 11.11 after stating the key lemmas involved. We defer the proof of the 
lemmas to the next section. Sections 15.1115.41 develop the technical machinery we use, and the proofs of the lemmas 
are in Section 15.61 

Given a (possibly infinite) binary sequence, a run of O's (of I's) is a maximal subsequence of consecutive O's (I's), 
i.e. an subsequence of O's bordered by I's (respectively, of I's bordered by O's). The first step consists in proving 
achievability by estimating /(X) for a process having i.i.d. runs with appropriately chosen distribution. 

Lemma 4.1. Lei be the process consisting of i.i.d. runs with distribution p'^j^{l) = 2^''{l + d{l log I — C2I/ 2)). Then 
for any e > 0, we have 

/(Xt) = l + dlogd~ Aid + A2d'^ + 0{d^-''). 
Lemma 14.11 is proved in Section 15.61 

Lemma 12.21 allows us to restrict our attention to stationary ergodic processes in proving the converse. For a 
process X, we denote by (X) its entropy rate. Define 

n-!-oo n[l — d) 

A simple argument shows that this limit exists and is bounded above by 1 for any stationary process X and any d, 
with H{Yx) = 1 iff X is the i.i.d. Bernoulli(l/2) process. 

In light of Lemma [4.11 we can restrict consideration to processes X satisfying /(X) > 1 — d^'"^ whence H{%) > 
l-di-%ff(yx) >l-d^-': 

Remark 4.2. There exists dQ{e) > such that for all d < do{e), i/ /(X) > C — d, we have /(X) > 1 — d^^*^ and 
hence also H{X) > 1 - d^'" , H{Yx) > 1 - d^~' . 

We define a 'super-run' next. 

Definition 4.3. A super-run consists of a maximal contiguous sequence of runs such that all runs in the sequence 
after the first one (on the left) have length one. We divide a realization of X into super-runs . . . , S*-!, 5o, 5*1, . . . . 
Here Si is the super-run including the bit at position 1. 
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6_4 

1 



6-3 





6-2 





b-i 
1 



bo bi 62 




63 64 65 
110 



^6 
1 



67 ^8 





Table 2: An example showing how X is divided into super-runs 



See Table [2] for an example showing division into super-runs. 

Denote by S the set of all stationary ergodic processes and by Sl* the set of stationary ergodic processes such 
that, with probability one, no super- run has length larger than L*. 

Our next lemma tightens the constraint given by Remark 14.21 further for processes in S^i^^j. 

Lemma 4.4. Consider any e > and constant k. There exists do{e, k) > such that the following happens for any 
X £ Sii/d\ ■ For any d < do, if 



ther 



/(X) > C7-Kd2-(e/2)^ 

H{Yx) > 1 - d^-' . 

We show an upper bound for the restricted class of processes Sl' ■ 

Lemma 4.5. For any e > there exists do — do{e) > and k < 00 such that the following happens. If d < do{e), 
for any X e ^li/^j, 

/(X) < 1 + d\ogd - Aid + A2d^ + Kd^-' . 

Finally, we show a suitable reduction from the class S to the class Sl' ■ 

Lemma 4.6. For any e > there exists dg — do(e) > such that the following happens for all d < do, and all 7 > 0. 
For any X e 5 such that H(Yx) > I ~ d'^ and for any L* > 2j\og{l/d), there exists X^. e Sl' such that 



/(X) < I{Xl*) + d'>-'{L*)-HogL* , 
H{Yx) > H{Yx^,) - d^-'{L*)-HogL* . 

Lemmas 14.41 14.51 and 14.61 are proved in Section 15.61 

The proof of Theorem 11.11 follows from these lemmas with Lemma [4.61 being used twice. 



(8) 
(9) 



Proof of Theorem ll.ll Lemma 14.11 shows achievability. For the converse, we start with a process X G 5 such that 
/(X) > C -d^. By Remark m H(Yx) > 1 - d^-^ for any (5 > and d < do{S). Use Lemma HH with 7 = 1 - (5, 
L* ^ [l/d\ and e = 6/2. It follows that for d < do{5/2), 



.)>C-d 



'2-25 



H{Yx) > HiYx^, ) - d 



'2-25 



We now use Lemma lL4l which yields H{Yx 



> 1 



J2-25 



and hence, by Eq. H{Yx) > 1 - 2d:- 



'2-25 



> 1 



J2-35 



for small d. Now, we can use Lemma lL6l again with 7 = 2 — 35, L* = [1 / d\ , e = S/2. We obtain 
Finally, using Lemma iLSl we get the required upper bound on C. 



□ 
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5 Proofs of the Lemmas 



In Section 15.11 we show that, for any stationary ergodic X that achieves a rate close to capacity, the run-length 
distribution must be close to the distributions obtained for the i.i.d. Bernouni(l/2) process. In Section [Ol we 
suitably rewrite the rate /(X) achieved by stationary ergodic process X as the sum of three terms. In Section 15.31 
we construct a modified deletion process that allows accurate estimation of H(Y\X") in the small d limit. Section 
15.41 proves a key bound on H{Yx) that leads directly to Lemma Finally, in Section we present proofs of the 
Lemmas quoted in Section |4] using the tools developed. 

We will often write for the random vector (Xa, Xa+i, ■ ■ ■ , Xi,) where the X^'s are distributed according to the 
process X. 

5.1 Characterization in terms of runs 

Let rUn be the number of runs in X". Let , L2, ■ ■ ■ , L,n„ be the run lengths {L^ being the length of the intersection 
of that run with X"). It is clear that H{X'"') < 1+H{mn, ,L2, . . . , Lm„) (where one bit is needed to remove the 0, 1 
ambiguity). By ergodicity m„/n — ^ 1/E[i] almost surely as n — )■ co. Also m„ < n implies H{mn)/n < logn/n — > 0. 
Further, limsup„_j.o2 H{Lf , L2, . . . , L„i^)/n < lim„_j.oo H{L)mn/n = H{L)/M[L]. If H{X) is the entropy rate of the 
process X, by taking the rt — > 00 limit, it is easy to deduce that 

m)<^, (10) 

with equality if and only if X is a process with i.i.d. runs with common distribution pi^. 

We know that given E[L] = /i, the probability distribution with largest possible entropy H{L) is geometric with 
mean fi, i.e. pl{1) = (1 — for ^ ^ 1> leading to 

If^ < -(1 - -) log (1 - -) - - log - ^ MI/m) . (11) 

Here we introduced the notation h{p) — —plogp — (1 — p) log(l — p) for the binary entropy function. 
Using this, we are able to obtain sharp bounds on p^ and /x(X). 

Lemma 5.1. There exists do > such that the following occurs. For any /? > 1/2 and d < dg, i/X e 5 is such that 
H{X) >l-dP, we have 

\fi(X)-2\<7d>^/'^ . (12) 



Proof. By Eqs. ^ and we have /i(1/m) > 1 - d^ . By Pinsker's inequality h{p) < 1 - (1 - 2p)2/(21n2), and 
therefore |1 — (2//i)p < (21n2)d^. The claim follows from simple calculus. □ 

Lemma 5.2. There exists do > and k' < 00 such that the following occurs for any /3 > 1/2 and d < dg. For any 
X G 5 such that H{%) > 1 — d^ , we have 



E 

1=1 



Vl{1) - 



2' 



Proof. Let p\{l) — 1/2', I > 1 and recall that /i(X) = E[_L] = X]i>iPi(0^- explicit calculation yields 



H{L) = ^iiX)-D{pL\\pl). 



Now, by Pinsker's inequality. 



(13) 



(14) 



D{pl\\pI)>:^\\pl-pI\\^tv 
Combining Lemma [STTl and Eqs. ([T0|) . (fT4|) and ()15|) . we get the desired result. 



(15) 
□ 
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For the rest of Section [STTl we only state our technical estimates, deferring proofs to Appendix IB] 
We now state a tighter bound on probabilities of large run lengths. We will find this useful, for instance, to 
control the number of bit flips in going from general X to Xl* having bounded run lengths. 

Lemma 5.3. There exists da > such that the following occurs: Consider any /3 > 1/2, and define i = [2/31og(l/(i)J . 
For all d < d^, i/X G 5 is such that iJ(X) > 1 — d^ , we have 

oo 

J2 ^Pl(1) <20d^, (16) 



We use L{k) to denote the vector of lengths (Li, L2, ■ • ■ , Lk) of a randomly selected block of k consecutive runs 
(a 'fc-block'). Formally, (ii, L2, . . . , Lk) is the vector of lengths of the first k runs starting from bit Xi, under the 
Palm measure Pi introduced in Section [21 

Corollary 5.4. There exists dp > such that the following occurs: Consider any positive integer k and any jS > 1/2, 
and define I = [2/31og(l/(i)J . For all d < do, i/X G 5 is such that _ff(X) > 1 — d^ , we have 

{h + ... + lk)PL{k){h,---M <20edf'. (17) 

ii+...+ik>ke 



Clearly, E[ii + . . . + Lk] = fc^(X). We have 

H{Li,L2, ...,Lk) 



A stronger form of Lemma [5?2] follows. 



Lemma 5.5. Let ih: - ■ ■ :lk) = 2 ^^=1 '* . For the same k' and do > as in Lemma \5.Sl the following occurs. 
Consider any positive integer k and any /? > 1/2. For all d < do, i/X G 5 is such that iL(X) > 1 — d^ , we have 



EE-E 

il =1(2 = 1 /fr = l 



PL{k){h, ■ --Jk) - P*L(k){h, • • • , lk) 



< K'Vkd^/^ 



We now relate the run-length distribution in X and in Y{X^) (as n — ^ 00). For this, we first need a character- 
ization of Y in terms of a stationary ergodic process. Let D = (. . . , Dq, Di, D2, ■ • •) be an i.i.d. Bernoulli(d), 
independent of X. Construct Y as follows. Look at Xi,X2, .... Delete bits corresponding to Di, D2, . ■ .. The bits 
remaining are Yi, 12, ... in order. Similarly, in Xo, X-i, X-2, ■ ■ ■ delete bits corresponding to Dq, -D-i, D-2, • ■ •• The 
bits remaining are Yo,Y^i, . . . in order. 

Proposition 5.6. The process Y is stationary and ergodic for any stationary ergodic X. 
Notice on the other hand that (X, Y) are not jointly stationary. 

The channel output y(X") is then (Y)*^ where M ^ Binoniial(n, 1 — d). It is easy to check that 

H{Y) = H{Yx) 

(cf. Eq. We will henceforth use L[{Y) instead of the more cumbersome notation H{Yx). 

Let ql denote the block perspective run-length distribution for Y. Denote by qL{k) the block perspective distri- 
bution for fc-blocks in Y. Lemmas 15. 1[ 15.21 15. 3[ 15.51 and CoroUarv 15.41 hold for any stationary ergodic process, hence 
they hold true if we replace (X,p) with (Y,q). 

In proving the upper bound, it turns out that we are able to establish a bound of H{Y) > 1 — d^"' for e > 
and small d, but no corresponding bound for _ff (X). Next, we establish that if H{Y) is close to 1, this leads to tight 
control over the tail for pi( • ). This is a corollary of Lemma [5.31 
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Lemma 5.7. There exists do > such that the following occurs: Consider any 7 > 1/2, and define I = [27 log(l/ d)\ . 
For all d < do, if H(Y) > 1 — d'' , we have 

00 

Ipl{1) < 80dT. 

i=2e 

Note that pi^ refers to the block length distribution o/X, not Y. 

Corollary 5.8. There exists do > such that the following occurs: Consider any positive integer k and 7 > 1/2, 
and define i = [27log(l/d)J . For all d < do, if H{Y) > 1 — d^, we have 

00 

{h + ... + lk)PL(k){h.---M) <80k^d^. 

l—2klQ 

Consider X being i.i.d. Bernoulli(l/2). Clearly, this corresponds to Y also i.i.d. Bernoulli(l/2). Hence, each has 
the same run length distribution p*i^{l) = q1{l) — 2^'. This happens irrespective of the deletion probability d. Now 
suppose X is not i.i.d. Bernoulli(l/2) but approximately so, in the sense that -ff(X) close to 1. The next lemma 
establishes, that in this case also, the run length distribution of Y is very close to that of X, for small run lengths 
and small d. 

Lemma 5.9. There exist a function (k, e) do{K,e) > and constants ni < 00, K2 < 00 such that the following 
happens, for any /3 € (1/2, 2), e > and k < 00. 

(i) For all d < do, for all X such that i?(X) > 1 — d^ , and all I < K\og{l/d), we have 

\Pl{1) - qL{l)\ < ^ld'+^^'-' ■ 

(a) For all d < do and all X such that iJ(X) > 1 ^ d^ , we have 

\^l{X) - ti{Y)\ < K2d^+^/\ (18) 

Let us emphasize that ki,K2 do not depend at all on l3,e,K, where as do does not depend on j3 in the above 
lemma. Analogous comments apply to the remaining lemmas in this section. 

As before, we are able to generalize this result to blocks of k consecutive runs. 

Lemma 5.10. There exist a function (k, e) i— t- do{n,e) > and a constant k < 00 such that the following happens, 
for any j3 € (1/2, 2), e > and k < 00. 

For all d < do, for all integers k > and {li,l2, ■ ■ ■ ,lk) such that X]i=i < '*log(l/d), and all X such that 
H{T) >l-d'^, we have 

\PLik){li, . . . , Zfc) - QLik) (^1, . . . , lk)\ < k' . 

In proving the lower bound, we have i/(X^) = 1 — 0{d^), but no corresponding bound for H{Y). The next lemma 
allows us to get tight control over the tail of ql^{-). 

Lemma 5.11. For any e > 0, there exists do = do{e) > such that the following occurs: Consider any j3 £ (1/2, 2], 
and define I = [41og(l/d)J . For all d < do, if H(K) > 1 - d^ , we have 

00 
1=1 

Define p2(fe)(^i' ■ ■ ■ j^k) = 2~^i'=i We show, using Lemma [5. 101 that if H{Y) is close to 1, than one can bound 
the distance between PL(k){') and p2(fc)( ' )• 
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Lemma 5.12. There exist a function (k, e) > do{K,e) > and constants ki < od, K2 < oo such that the following 
happens, for any e > and k < od. 

(i) For all d < do, all sources X such that i7(X) > 1 — d'^'^ and H{Y) > 1 — d'^, and all integers k > and (Zi, I2, ■ ■ ■ , Ik) 
such that X^iLi h < '^log(l/d), we have 

\pL(k){h,.-.M)-pUu){h,...,h)\ < (19) 
\pL^k){lu---Jk)~qLik){h,---,lk)\ < d'+^/'-'. (20) 

(ii) For all d < dg, all sources X such that _ff(X) > 1 — d" ® and H{Y) > 1 — d"' , we have 

lAi(X) - 2| < , (21) 

|Ai(X)-^(Y)| <At2d'+^/'. (22) 

The next Lemma assures us that if X e ^li/dl i then very few runs in Y are much longer than [l/dj . In fact, we 
show that (7L(A[l/(iJ) decays exponentially in A. 

Lemma 5.13. There exists d^ > such that, for all d < dg, the following occurs: Consider any X e ^li/d\ such 
that i/(X) > 1 — d"^/^. Then, for all \ > 2 such that A[l/dJ is an integer, we have 

qL{X[l/d\)<d^-\ 



Next, we prove some analogous results for super-runs, cf. Definition 14. 3[ that we also need. 

We denote by L'^"'^ the length of the first run in a random super-run and by i'''* the total length of the remaining 
runs of the same super-run. More precisely, we repeat here the construction of Section [21 and define a new Palm 
measure, Psi, which is the measure of X conditional on Xi being the first bit of a super-run. Then, U"'^ the length of 
the first run of this super-run, and L^" is the residual length of the same super run, always under the Palm measure 
¥si- Here 'rep' indicates 'repeated' with L""^ being the number of repeated bits and 'alt' indicates 'alternating' with 
L''" being the number of alternating bits. We denote the type of a random super run by T = (L'^p, L'''*) and the 
length by L = L'''* -I- U^^. We need versions of Lemmas 15.31 and 15.71 for super-runs. 

Define /I(X) = 1/E[L]. It is easy to see that 

if(X)<ffi. (23) 
^(X) 

We denote by pf the distribution of T. Define p~,{li,l2) = 2^'^^'^, this being the distribution for the i.i.d. 
Bernoulli(l/2) process X*. We denote by pj^ the distribution of L in X. Clearly, 

Lemma 5.14. There exists do > such that the following occurs. For any /3 > 1/2 and d < do, ?/ X G 5 is such 
that i/(X) > 1 — d^ , we have 

|m(X)-4| <4d^/2_ 



Lemma 5.15. There exists do > such that the following occurs: Consider any j3 > 1/2, and define I = 
[2/31og(l/d)J . For all d<do, ifX&S is such that H{X) > 1 - d^, we have 

00 

Y,lpi{l)<AMP . 

1=1 
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Let qf^{-) the distribution of super-run lengths in Y, and /i(Y) denote the mean length of a super-run in Y. 

Lemma 5.16. There exists do > such that the following occurs: Consider any 7 > 1/2, and define £ = 
[27log(l/d)J . For all d < do, if H{X) > 1 - and H{Y) > 1 - d^ , we have 

00 

i=e 

Note that pj^ refers to the super-run length distribution of X, not Y. 

Corollary 5.17. There exists do > such that the following occurs: Consider any positive integer k, any 7 > 1/2, 
and define £ = [27log(l/d)J . For all d < do, if H{X) > 1 - d°-^ and H{Y) > 1 - d^^ , we have 

00 

J2 {li + ... + lk)PHk){h,.-.M) <80k^d^. 

h+...+ik>ke 



Proofs of all results stated in Section 15.11 above (except the first two) are available in Appendix |BJ 
5.2 Rate achieved by a process 

We make use of an approach similar to that of Kirsch and Drinea [1] to evaluate /(X) for a stationary ergodic process 
X that may be used to generate an input for the deletion channel. A fundamental difference is that [4] only considers 
processes with i.i.d. runs. Our analysis is instead general. This enables us to obtain tight upper and lower bounds 
(up to 0{d'^~'^)), hence leading to an estimate for the channel capacity. 

We depart from the notation of Kirsch and Drinea, retaining Xi for the ith bit of X, and using Y{j) to denote 
the jth run in y(X"). Denote by Li, L2, ■ ■ ■ , Lm the lengths of runs in X" (where m is a non-decreasing function of 
n for any fixed X^). Let the zth run consists of &(i)'s, where b{i) S {0, 1}. For instance, if the first run consists of 
O's, then b{i) = i + l (mod 2). 

We use X{j) to denote the concatenation of runs in X that led to Y{j), with the first run in X{j) contributing 
at least one bit (if the run is completely deleted, then it is part oi X{j — 1)). X(l) is an exception. This is made 
precise in Table |31 which is essentially the same as [H Figure 1], barring changes in notation. We call runs in X{j) 
the parent runs of the run Y{j). 

We define K{X^) as the vector of |-^(i)|. Let the total number of runs in y(X") be M. Thus, 

Y{x") = y (1) . . . Y{M - i)r(M) , 

X" = X{1) . . . X{M - 1)X{M) , 
i^(X") = (|X(l)|,...,|X(Af-l)|). 

Note that X{j) consists of an odd number of runs for 1 < j < M. 
We write 

/(X"; r(X")) = H{Y) - H{Y, K\X'') + iJ(i^|X", Y) , (24) 

which is analogous to the identity /(X";r(X")) = iJ(X") - H{X'\K\Y) + H{K\X'\Y) used in [4^, but more 
convenient for our proof. 

Let Ly be an integer random variable having the distribution q^, i.e. the distribution of run length in Y. It is 
easy to see that 

n^oo n{l-d) ^ ' - p{Y) 

holds, similar to (fTO|) . It turns out that this suffices for our upper bound (cf. Lemma [4. 4|) . 
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1: Set X(l) = y(l) =the empty string. 

2: j^l 

3: For i = 1 to m do 

4: a^b{i)^^ 

5: Cl! the bits in Y that arise from ith run in X 

6: % o" is a (possibly empty) string of all b(i)'s. 

7: 7o Y(j) is a (possibly empty) string of all b{j)'s. 

8: If b{i) = b{j) or |a;| = then 

9: °h uj is contained in the current block Y{j) of Y 

10: Y{j)^Y{j)u: 

11: X{3)^X{j)a 

12: Else % is a prefix of Y{j + 1) 

13: + 1 

14: y(i)^y(j> 

15: X{j)^X{j)a 

16: End If 

17: End For 



Table 3: Procedure for generating Y{l),Y{2), . . . ,Y{M) and X{l),X{2), . . .,X{M) given X" and 
(adapted from [4j Figure 1]). 



Consider the second term in Eq. (|24)) . Let Z?" denote the n-bit binary vector that indicates which bit locations 
in X" have suffered deletions. We have 

H{Y,K\X'') = H{D''\X'') - H{D''\X'',Y,K) 

= nh{d)-H{D"\X",Y,K). (25) 

We study H{D^\X^, Y, K) by constructing an appropriate modified deletion process in Section [Ol 
Consider the third term in Ea. (|24)) . From [4], we know that 

H{K\X^,Y) ^ lim„^„o H{ \X{2)\ \X{2) . . . X{M), Y{2) . . . Y{M)) 
n E[|^(2)|] 

Here X{2) . . .X{M) denotes the string obtained by concatenating ^(2), . . . , X{M), without separation marks, and 
analogously for Y{2) . . . Y{YI). Roughly, single deletions do not lead to ambiguity in |X(2)| if X{2) . . . and Y{2) . . . 
are known. Thus, this term is 0{(P). It turns out we can we can get a good estimate for this term by computing it 
for the i.i.d. Bcrnoulli(l/2) case. 

Lemma 5.18. For any e > 0, there exists do = do{e) > 0, and n < oo such that for all d < do the following occurs: 
Consider any X G ^li/d\ such that iJ(X) > 1 — d^~'^ and max{iJ(X), i?(Y)} > 1 — for some 7 £ (1/2,2). Then 



lim - H{K{X")\X",Y{X" j) - d^C4 

n— >-oo 77, 



< , (26) 
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where 



C4^5]2-(2+^) {j^l){j-3)h 

i=2 j=4 \ "T J / 

Note that with 7 = 2 — e/2, we obtain \5\ < k(P~'^. 

The proof of Lemma [5TTH] is quite technical and uses a modified deletion process (cf. Section [S75|) . We defer it to 
Appendix [C] 

Lemma 5.19. For any e > 0, there exists do = do{e) > such that if H{Y) > 1 — d"^^^/^ , then 

CO 

H(Y) < 1 - - ^ (0 ( log qL (0 + + rf'^' : 
/=1 

/or a/Z d < do- 

The proof of this lemma is fairly straightforward. 

Proof of Lemma \5.19\ An explicit calculation yields H{qL) = fi{Y)—D(qL\\q'^) where is the run length distribution 
corresponding to the i.i.d. Bernoulli(l/2) half process (cf. proof of Lemma [5. 2p . We know H{Y) < _ff(qi)//i(Y). It 
follows that 




HiY)<l-DiqL\\ql)/^i{Y). (27) 

Using Lemma I5.12f ii') , we deduce that 

<-d'-'/\ 
- 3 

and, in particular, /i(Y) < 3 for small d. Hence, substituting in Eq. (P7)) and using the lower bound H{Y) > 1 — d^^^/^ 
we have D{qL\\q1) < 3d^~'^/^. Exphcit calculation gives D{qL\\q1) ~ J2^i 9^(0(1089^(0 + 0- The result follows by 
plugging into Eq. (gT]). □ 



1 1 



5.3 A modified deletion process 

We want to get a handle on the term H{D'^\X'^,Y, K). The main difficulty in achieving this is that a fixed run in 
Y can arise in ways from parent runs, via a countable infinity of different deletion 'patterns'. For example, consider 
that a run in Y may have any odd number of parent runs. Moreover, a countable infinity of these deletion patterns 
'contribute' to H{D"\X" ,Y, K). 

However, we expect that deletions are typically well separated at small deletion probabilities, and as a result, 
there are only a few dominant 'types' of deletion patterns that influence the leading order terms H{D"\X^^ ,Y, K). 
Deletions that 'act' in isolation from other deletions should contribute an order d term: for instance a positive fraction 
of runs in should have a length 4, and with probability of order d, they should shrink to runs of length 3 in 1" due 
to one deletion. Each time this occurs, there are four (equally likely) candidate positions at which the one deletion 
occurred, contributing log(4) to H{D"\X" ,Y, K). Similarly, pairs of 'nearby' deletions (for instance in the same 
run of X") should contribute a term of order d^. We should be able to ignore instances of more than two deletions 
occurring in close proximity, since (intuitively) they should have a contribution of 0{d'^) on H{D"\X" , Y, K). 

We formalize this intuition by constructing a suitable modified deletion process that allows us to focus on the 
dominant deletion patterns in our estimate of this term. We bound the error in our estimate due to our modification 
of the deletion process, leading to an estimate of H{D^\X^, Y, K) that is exact up to order d^ . 
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We restrict attention to X e 5^1 /^j . Denote by Rj the jth run in X (where the run including bit 1 is labeled 
Ri). Rj has length Lj. Recall that the deletion process D is an i.i.d. Bernoulli((i) process, independent of X, with 
Di being the rt-bit vector that contains a 1 if and only if the corresponding bit in X" is deleted by the channel 
Wn- We define an auxiliary sequence of channels Wn whose output -denoted by is obtained by modifying 

the deletion channel output: Y{X'^) contains all bits present in Y{X^) and some of the deleted bits in addition. 
Specifically, whenever there are three or more deletions in a single run R^ under D, the run Ri suffers no deletions in 
Y{X"). 

Formally, we construct this sequence of channels when the input is a stationary process X as follows. For all 
integers i, define: 

= Binary process that is zero throughout except if Ri contains at 3 or more deletions, in which case = 1 if 
and only if Xi G Ri and Di ~ 1. 

Define 

oo 

Z= Z\ 

i— — oo 

where ^ here denotes bitwise OR. Finally, define D(D,X) = BqZ (where ® is componentwise sum modulo 2). The 
output of the channel Wn is simply defined by deleting from X" those bits whose positions correspond to Is in D. 
We define K{X^) for the modified deletion process in the same way as K{X"). The sequence of channels Wn are 
defined by B, and the coupled sequence of channels Wn are defined by D. We emphasize that ED is a function of 
(X,D). 

Note that if £>( =0 then Zi ~ and hence Di = 0. Thus D is obtained by flipping the fs in D that also 
correspond to Is in Z. If Zi ~ I, i.e. Di — l,Di = 0, we will say that a deletion is reversed at position i. It is not 
hard to see that the process Z is stationary. (In fact (X, D, Z, D) are jointly stationary.) Define z = P(Zi = I), where 
i is arbitrary. 

The expected number of deletions reversed due to a run with length ^ is bounded above by 

id - M{1 - dy-^ - 2 d^{l - dy-^ < £{e 2)d^ < fd^ , (28) 

using (1 - d)'-i >l-{l-l)d and (I - d)'-^ >!-(/- 2)d. 

We know that each run has length at least I. Thus, we have the following. 

Fact 5.20. For arbitrary stationary process X, the probability z of a reversed deletion at an arbitrary position i is 
bounded as z < d^E[L^] . 

Now E[L^] < d^'^E,[L] for X G 'Sii/d\ ■ Combining with Lemmas 15.31 and 15. 7[ we obtain: 

Fact 5.21. For any e > 0, there exists do = do(e) > and k < oo such that for any d < do the following occurs: 
Consider any X G ^[i/d] such that max{_ff (X), H{Y)} > 1 — d'^ . Then we have E[L"^] < nd''^"^ . 

Note that max{i7(X), i7(Y)} > I-d^-^/a holds for relevant processes X (see Lemma |4^ . justifying our assump- 
tion above. 

The next proposition follows immediately from Facts 15.201 and 15.211 

Proposition 5.22. For any e > 0, there exists do = do{e) > and k < oo such that for any d < do the following 
occurs: Consider any X G S\^i/d\ such that max{_ff(X), H(Y)} > f — d"'. Then we have z < nd^^^' . 

We now analyze the modified deletion process with the aim of estimating H {D'^\X'^ , F, K). Notice that for any 
run Ri, either all deletions in Ri are reversed (in which case we say that Ri suffers deletion reversal), or none of the 
deletions are reversed (in which case we say that Ri is unaffected by reversal). It follows that 

M 

H{D-\X", Y,K) = Y, H{D{j)\X{j),Y{j)) , (29) 



17 



where D{j) consists of the substring of ID" corresponding to X{j). As before, when we study H{]D"\X", Y, K)/n in 
the hmit n — )> cx), the terms corresponding to j = 1 and j — M can be neglected, and we can perform the calculation 
by considering the stationary processes X, Y and D. 

Recall the definition of the parent runs X{j) of a run Y{j) for j > 1 from Section [521 Consider the possibilities 
for how many runs X(j) contains, and the resultant ambiguity (or not) in the position of deletions (under D) in the 
parent run(s): 

A single parent run. 

Let the parent run be i?p. The parent run should not disappeai0; by definition it should contribute at least one 
bit to Y{j). The run Rp+i should not disappear (else it is also a parent). Rp can suffer 0, 1 or 2 deletions (else 
we have a deletion pattern not allowed under D) . The cases of 1 or 2 deletions lead to ambiguity in the location of 
deletions. 

Note that if Rp-i disappears then i?p_2 also disappears (else Rp-2, Rp-i are also parents of Y{j)), and so on. 
A combination of three parent runs. 

Let the parent runs be Rp,Rp+i and i?p+2- We know that Rp and Rp+3 did not disappear and Rp+i has 
disappeared, by definition of X{j) (cf. Table [3]). If Rp and Rp+2 suffer no deletions, this leads to no ambiguity in 
the location of deletions. Ambiguity can arise in case Rp and Rp+2 suffer between one and four deletions in total. 
Note that if Rp-i disappears then Rp-2 also disappears, and so on. 
A combination of 2fc + 1 parent runs, for fc = 2, 3, . . .. 

Let the parent runs be Rp, Rp+i, . . . , Rp+2k- The runs Rp+i, Rp+3, . . . , Rp+2k-i must disappear and Rp does 
not disappear. The runs Rp, Rp+2, ■ ■ ■ , Rp+2k must suffer between one and 2(fc + 1) deletions in total for ambiguity 
to arise in the location of deletions. 

Define 

00 

Pl(3)(>1, h, h) = X! PL(Z){h,l2,h) , 

00 00 

PL(3)(>1, h, >1) = ^ ^ PL{3){ll,l2, h) , 
h=2 13=2 

and so on. 

The following lemma shows the utility of the modified deletion process. We obtain this result by adding the 
contributions of the cases enumerated above. 

Lemma 5.23. There exists do > such that for any d < do the following occurs: Consider any X G ^[i/d] ■ Then 
lim -H{D"\X",Y,K) = 

71— >CXD fl 



, 00 

Y,PL{i)nogi 



1=2 

00 



I'm t. 



(0{(^)log(^)-Plog<} 



^;^Il{pi(3)(>i>^>i) nogZ-pi(3)(i,/,i) /log?} 



1=2 



+ ^ E {pL(3)(^0,l,y (?0 + ^2)log(;o + ^2)}+ ^ {pL(3)(l,l,yZ2l0g/2} +<5, (30) 
' yn>l,i2 1,1, b / 
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We emphasize that we are referring here to deletions under 
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where 



~lld3log(l/d)E[L3] < ^ < 140^3 log(l/d)E[L3] 



(31) 



The proof of Lemma 15.231 is quite technical and is deferred to Appendix iDl 

Making use of the estimates of pj;,(fe)(-) derived in Section [5TT1 we obtain the following corollary of Lemma [5.231 
It is proved in Appendix |Dl 

Corollary 5.24. For any e > 0, there exists da = (io(e) > and k < oo such that for any d < da the following 
occurs: Consider any X G Sli/d\ such that i7(X) > 1 — d^^*^ and max{i7(X), iJ(Y)} > 1 — d'^ for some 7 G (0,2). 
Then 

lim -i/(5"|x",y,i^) - {y2p^(i)iiogi} + d\s + L 

where \^\ < nd'^+'f-"/'^ . Recall that 

C3^U-l + f2'^-'{Q log (2) - log' + 3) log(Z - 1) + (; - 2) log(/ - 2) 




Note that with 7 = 2- e/2, we obtain |^| < «;(^^~^ 

We need to show that our estimate for the modified deletion process is also a good estimate for original deletion 
process. The following simple fact helps us do this: 

Fact 5.25. Suppose U, U and V are random variables with the property that U is a deterministic function of U and 
V, and also U is a deterministic function of U and V. (Denote this property by U -r-^ U .) Then 

\H{U)~H{U)\<H{V). 

Proof. We have H{U) < H{U, V) < H{U) + H{V). Similarly, H{U) < H{U) + H{V). U 

It is not hard to see that (X", F, K, £>") (X", F, 5") and (X", F, K) (X", F, K). Using Fact lOSi 

we obtain 

|H(5"|X",F,A') -iJ(i:'"|X",F,if)| < 2H{Z") < 2nh{z). (32) 

Combining Eq. ([5^ with Corollary 15.241 we obtain an estimate for the second term in Eq. ([M)) . For future 
convenience, we form an estimate in terms of (7l(') instead oi pl{-), using Lemma l5.12l to make the switch. 

Corollary 5.26. For any e > 0, there exists do = do{e) > and k < 00 such that for any d < do the following occurs: 
Define £ = [41og(l/d)J . Consider any X G ^li/^j such that H{T) > 1 - d^-" and max{iJ(X), i7(Y)} > 1 - d^-^/^. 
Then 

£ £ 

lim iH(F(X"),if(X")|X")=-^^<j,.(0/log? + -^^gi(OZ 
n-s-oo n 2 ^ — ' 4 In 2 — ' 

1=2 1=1 

where \6\ < nd^^^. Recall C2 = J^lZi 2^'nnL 
Corollary 15.261 is also proved in Appendix [D] 
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5.4 A self improving bound on H{Y) 

Our next Lemma constitutes a 'self-improving' bound on the closeness of H{Y) to 1 and leads directly to Lemma 

1131 



Lemma 5.27. There exists a function (k, e) i— >■ do{K^e) > such that the following happens for any e > 0, and 
constants k > and 7 £ (1/2, 2). For any d < do and any X e S\^ifd\ such that 

/(X) > 1 - d\og{l/d) - Aid - Kd^-^'l^^ 

and H{Y) > 1 — d'^ , we have 

H{Y) > 1 - _ 

Proof. From Eq. (|24p we have 

7(X)= lim - {H{Y) - HiD"") + H{D''\X'',Y,K) + H{K\X",Y)} 

n— ^00 Tl 

= (1 - d)H{Y) - h{d) + lim - {H{D''\X'\Y, K) + H{K\X'^, Y)} . (33) 
Using Eq. ^ and Proposition [02l we have 

-|ij(i:'"|x",y,if)-ij(i5"|x",y,^)| < md^+'^iogn/d) . 

n ' ' 

It follows from H{X) > /(X) and our assumed lower bound on /(X), that H{X) > 1 — d^^'^ for some e > 0. Using 
Corollary 15. 24) |/i(X) — 2| < K2d''^'^ from Lemma [5.12( 11). and Lemmas 15.12( 1) and 15.71 to control pi(-), we have 

1 d °° 



lim -i/(i5"|X",f,X) = -(y 2^' /log?) +(^1, 



n— foo 72 _ 

Z=2 



where \Si\ < K3d^+-^/^-'/'^. 
Lemma 15.181 gives 



lim i7(if|X",r) < Kid^+^^^-'/\ 



We used here 7 < 2. 

Plugging back into Eq. ([55]) . we obtain 

/(X) < i7(Y) - d\og{l/d) - Aid + K5di+''/2-'^/4 . 

The result follows from the assumption on /(X). □ 



5.5 Auxiliary lemmas for our lower bound 

Lemma 5.28. Recall "K^ is the process consisting of i.i.d. runs with distribution p\{l) ~ 2^'(1 + d{llogl — C2I/2)) 
(cf. Lemma \4-.l^ . There exists dp > such that, for any d < d^ we have the following: For any integer i and any 
x''^^, we have 

\^{XI = l|(Xt)r^ = x^zl] 1/2| < 0.05 . 
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Proof. Without loss of generality, suppose Xi-i = 1. Also, suppose that it is the Ith consecutive 1 to occur. Now, 
since the runs' starting points form a renewal process under X^, we have 



p{xj 


= o|(xt)!r^ 


— oo } 


piii) 


F{xj 




— oo } 





A little calculus yields 



i'>i 



where |77i_/| < kiI for some k < oo. In comparison, p\^{l) = 2 ' (1 + d{Zlog^ — C2//2}). 
Case (i): I < l/^d. 

In this case, we have p[(0 = 2-'(l + ri2,i) with \r]2,i\ < * and E;'>;Pl('') = ^-'(l + 773,/) with \ri3j\ < d"-^, for 
sufficiently small d. The result follows. 
Case (ii): / > l/Vd. 

In this case, {/log l + rjij} = {I log I — C2l/2}{1 + 774,/), where 1774,1 1 < 0.01 provided d is small enough. It follows that 



Plil) 



j:,>iplii': 



- 1 



< 0.02. 



The result follows. 



□ 



Lemma 5.29. Let q\{-) be the run length distribution 0/ corresponding to input X^. Then there exists do (same 
as in Lemma \5. 28]) such that, for any d < do, we have qL{l) < (3/4)' for all I. 

Proof. It follows from Lemma [5.281 that for any y!l^, we have 

|p{y/ = = 2/!roi} - 1/2| < o.i , 

for d < do. This gives <7l(0 < (0.45/0.55)', implying the result. □ 



5.6 Proofs of Lemmas 14.11 14.41 14.51 and 14.61 

We first prove Lemma 14.61 followed by Lemmas 14.11 14.41 and 14.51 

Proof of Lemma \4.6] We construct X £ Sl» from X as follows: Suppose a super-run starts at Xj and continues 
until Xj^i^*. We flip one or both of Xj+^.+i and Xj^L*^2 such that the super-run ends at Xj^^*. (It is easy to 
verify that this can always be done. If multiple different choices work, then pick an arbitrary one.) The density of 
flipped bits in X is upper bounded by a = 2E[LI(L > L*)]/L* . The expected fraction of bits in the channel output 
Y = Y{X") that have been flipped relative to F = Y{X") (output of the same channel realization with different 
input) is also at most a. Let F — F{X, D) be the binary vector having the same length as Y, with a 1 wherever the 
corresponding bit in Y is flipped relative to Y, and Os elsewhere. The expected fraction of I's in F is at most a. 
Therefore 



H{F) <n{l- d)h{a) +\og{n + l) . (34) 



Recah Fact lOSl Notice that Y w Y, whence 



\HiY)-HiY)\<H{F). (35) 
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Further, X — X — X" — Y form a Markov chain, and X, X" are deterministic functions of X. Hence, H{Y\X^^) = 
H{Y\±). Similarly, H{Y\X'') = H{Y\X). Therefore (the second step is analogous to Eq. ((55|) ) 

\H{Y\X'') - H(Y\X'')\ = \HiY\X) - H{Y\X)\ < H{F) . (36) 

It follows from Lemma 15.161 and L* > 2'-f\og{l/d) that a < 80(P/L* for sufficiently small d. Hence, h{a) < 
dT-MogLVL* for d < do(e), for some do(e) > 0. Now Eqs. ^ and ^ gives Eq. ([8]), where as Eq. dH) follows by 
combining Eqs. §5\i and §^ to bound |/(X) - /(X)|. 

□ 

Proof of Lemma \4-1\ We first make some preliminary observations. Direct calculation leads to i/(X^) = H{p]^) / 
1 - 0(d2), and lAi(Xt) - 2| = 0(d). From Lemma El^n) , we deduce |/i(Yt) - 2| = 0(d). 

Since X^ consists of independent runs, the same is true for Y^. Hence, recalling the notation g£(Z) = 2^', we 
have 

i7(Yt) = i7(gi)/M(Yt) = 1 - i?(g[||{2-'})/M(Yt) 

^ OO 

Define ^ = [41og(l/(i)J . It follows from Lemma [OHl that J2Zi+i ^llW = 0{d^), leading to 

H{Y^) > 1 - ^ ^ gi(0( log(?l(0 + /) + 0(d3) . (37) 
Now, from Lemma IS.Qf i). we know that 

klil) - pI{1)\ < ^2d'-'/' (38) 

for / < i. 

A Taylor approximation yields 



^ <?!(/)( log gl(/) + 1) = -Lj2( (^KO - 2-') + 2'-i (gi(Z) - 2"') ' ) + Oid'- 

1 OO f)/ — 1 ^ 2 

,i5i:(ri<o-2i+^i:(,i«)-2i +o(i 



^^2-' (-C2//2 + nn0' + O(d^-^) 



(=1 

^^2-' (-C2//2 + nnOVO(rf^ 
(=1 



i3-e^ 



21n2 



1=1 

Plugging back into Eq. ([37]) and using |/i(Y^) — 2| — 0{d), we obtain 

d2 /3 

/=1 



^c2 + ^2-' ((/InO'-ca/'lnm +0(^3-^). (39) 
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We construct £ ^li/d\ from X^^ by flipping a few bits as in the proof of Lemma [4.61 The fraction of flipped 
bits, both in Xt and in Y\ is at most a = 2E[Zl(L > < 0{2-'^/'^) = 0{d*). Proceeding as in the proof 

of Lemma 14. 6[ cf. Eqs. and we have 



- < nh{a) = nO{S) . 



(41) 



For each bit that is flipped, the number of runs in Y can change by at most 2, and the number of runs of a particular 
length can change by at most 3. It follows that 



1 



1 



and, for any positive integer 



We then deduce from the above that 



M(Yt) ^(Yt) 



< 2a = 0[d'^) 



<3a = 0{(f) . 



M(Yt) ^(Yt) 

/i(Yt)-^(Yt) =0{d^), 



and for any Z > 0, 



< Kid"* 



where q\{-) is the distribution of runs under Y . From Eq. (p8|) . it follows that for I < £, 



(42) 



We have 77(rt|(i-t)n) ^ i/(yt^ /i:t|(xt)n) „ H{KmX^y\Y'i) where = K{{X^y'). We use Corollary [OSl 
and Lemma 15.181 to arrive at 

t I 

hm \h{y\x^T) = diog(i/d) - ^E4(0 ^iog' + 2i?;^E4(0^ 



i=2 



1 - 



C2\ d 



2 y ln2 



C3 + C4 



2 In 2 



(=1 

+ 0(^3-^) . 



(43) 



Combining Eqs. (|4Tj), glj) and (I42j), we obtain. 



1=2 



l = \ 

f 1 _ A _ ("^3 + C4 + + 0(d3-^ 

V 2 y ln2 21n2 / ^ 



A calculation yields 

lim -i7(rt|(X'f)") 
dlog(l/d)+(l- 



C2\ d 

2 y ln2 



C3 + C4 



1 

4 In 2 



2 + 3c^ + 2 ^ 2-\[l In 0^ - cj^^ In Z) 



(44) 
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Finally, 



The result now follows by using the estimates in Eqs. (p9| and Eq. (144 
We obtain 



/(Xt) = (1 - d)H(Yt) + lim -H(yt|(xt)"). 



/(X^) > 1 - d\og{l/d) - Aid + Aad^ + 0{d^-') , 



where 



Ai = log(2e) 



2 In 2 
1 / 3 



"^^ = -4lb ((Hn/f-cPln; 

\ 1=1 / 

1 / °° 

— 2 + 3c^ + 2^2-' (^{llnlf - C2l^lnl 

^ \ 1=1 

\ 1=1 1=1 



+ 41n^ 



■ 41n: 



□ 



Proof of Lemma \4^ Let 7* = sup{7 : H{Y) > 1 — d''}. Then 7* > 1 +7*/2 — e/2 must hold, else Lemma [5 . 2 71 leads 
to a contradiction. It follows that 7* > 2 — e, hence the result. 

We use here the fact that do in Lemma [5.271 does not depend on 7. □ 

Proof of Lemma \4-.5\ Fix e > 0. Consider any X G ^i^i/^jj . Assume 

/(X) > 1 - dl0g(l/d) - Aid - d2-(e/8) _ 

(If not, we are done, for small enough d.) 

By Lemma [44l we know that i?(Y) > 1 - d^"^''/^). Now, we use Lemma [5T9l Corollary 15.261 and Lemma [5T8l 
for the three terms in Eq. ([M)) . to arrive at 



00 

/(X) < l-dlog(l/d) - - ^ qL{l){\ogqL{l) + 



2 

1=1 

41og(l/d) 41og(l/d) 

+ f E 9L(0^1og/-^ E 9L(0^ + £irf + 52d' + '«irf'"% (45) 

/=2 (=1 

where ci,C2 can be explicitly computed in terms of constants above, and ki < 00 is independent of qL- The precise 
value of these constants is irrelevant for the argument below. 

Since we know that X S ^^i/dj. Lemma [5.131 tells us that the tail of q^ is small. Define t= [S/dJ. We deduce 
that 

00 00 

1=1+1 i=e+i 
for small enough d. From elementary calculus, we obtain 

'T,Ze+ilL{l) 



i=i+i i=e+i 

> £d^ + dHogd^ > d"^-'/^ . (46) 
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From Lemma 15.31 we deduce 



^ qL{l)l < d^-' . (47) 

i=41og(l/(i) 



Plugging the bounds in Eqs. P5|) . (|T7)) into Eq. (HSl) . we obtain 

1 ^ 

/(X) < l-dlog(l/d)--^gi(/)(loggi(0+0 

1=1 

d v-^ , do 



/=2 i=l 



where K2 < oo is independent of q^. 

Now we simply maximize the bound over 'distributions' {9L(0}f=i satisfying X]i<f 9^(0 — li to arrive at an 
optimal distribution 



for I < £, where B{d) is such that J^KelliO — 1' ^^d S = C2/ln2. Note that q1{l) has no dependence on the 
process X we started with. 
It is easy to verify that 



This leads to 



We now have 



B{d) = 1 + 0(^2-^/2). 

2-^ {l + d{~C2l/2 + l\nl) + 0{d^-'/^)) ioTl<l 
' ^ { 2-'/20(l) otherwise 



1 ^ 

/(X) < l-dlog(l/d) - - ^ 92(0 (log 92(0 + 



2 

(=1 



+ ^ E 9^(0 Hog / - ^ ^ g2(0^ + cid + d2d' + K,d'~^ , (48) 

1=2 1=1 

for some K4 < oo. Again, calculus yields 

L61og(l/d)J r/2 °° \ 

J2 92(0 (log 92(0 + = ^ 2^2 + E ((""0' - In/) + Oid'-^) . 

1=1 \ 1=1 I 

We substitute in Eq. to get the result. □ 



6 Discussion 

The previous best lower bounds on the capacity of the deletion channel were derived using first order Markov 
sources. In contrast, we found that the optimal coding scheme for small d consists of independent runs with run 
length distribution p|^(0 = 2~'(1 + d(l log I — c^lj'if) This leads to the natural question How much 'loss ' do we incur 
if we are only allowed to use an input distribution that is a first order Markov source? 

The following theorem is fairly straightforward to prove using the results we have derived. It provides an upper 
bound on the rate achievable with a Markov source, and also a precise analytical characterization of the optimal 
Markov source for small d. 
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Theorem 6.1. Fix any e > 0. Consider the class of first order Markov sources. There exists k < oo and do = 
(io(e) > 0, such that for and any X in this class, 

/(X) < 1 - dlog{l/ d) - Aid + A'^d"^ + Kd^-" 

holds for any d < d^, where 

A'^ = 2c^/ln2 + C3 + C4 + l/(21n2), 

^5^^E{^(^~3)2-'logO . 
1=1 

Denote the symmetric first order Markov source with p{d) = P{Xi = b\Xi^i = h) = 1/2 + c^d for b G {0, 1}, by X. 
We have 

/(X) > 1 - dlog(l/d) - Aid + A'^d^ + Kd^-^ . 

Numerical evaluation yields A'^ « 1.57796256 and C5 « 0.60409609. We have A2 - A'^ « 0.10018339, implying 
that the restriction to Markov sources leads to a rate loss 0/ 0.10018339 bits per channel use, with respect to the 
optimal coding scheme. 

Remark 6.2. Lower bounds are derived in I2j using Markov sources and 'jigsaw' decoding. In this case we can show 
that the best achievable rate is 

1 - dlog(l/d) - Aid + {A'2 - Ci)d^ + 0(d^~') , 

and that X achieves this rate to within 0{d^^'^). Thus, the lower bounds in [2] are off by A2 ~ A'2 — c^ 0.904(i^, to 
leading order. 

Remark 6.3. The utility of our asymptotic analysis is confirmed by considering the prescription for the optimal 
optimal Markov source X provided by Theorem \6.1\ Drinea and Mitzenmacher optimized numerically over Markov 
sources obtaining, for instance, p = 0.53 for d — 0.05. Our analytical prediction yields p(0.05) « 0.530204804. 

In comparison, we have shown that /(X^) = C — 0(d^^'^). In fact, we conjecture that an even stronger bound 
holds. 

Conjecture 6.4. /(Xt) = C - e{d^) 

The reasoning behind this conjecture is as follows: We expect the next order correction to the optimal input 
distribution to be quadratic in d. If /(X) is a 'smooth' function of the input distribution, a change of order d^ in the 
input distribution should imply that /(X) decreases by an amount Q{{d^)^) — <d{d*) below capacity. 

Our work leaves several open questions: 

• Can the capacity be expanded as 

C = 1 - dlog(l/d) - Aid + A2d^ + Asd^ + A^d^ + . . . 

for small dl If yes, is this series convergent? In other words, is there a do > such that for all d < do, the 
infinite sum on the right has terms that decay exponentially in magnitude? We expect that the answer to both 
these questions is in the affirmative. We provide a very coarse reasoning for this below. 

The analysis carried out in the present paper suggests that the optimal input distribution for d < do does 
not have 'long range dependence'. In particular, we expect correlations to decay exponentially in the distance 
between bits. Suppose we are computing contribution to capacity due to 'clusters' of k nearby deletions. These 
'clusters' should correspond to k deletions occurring within 2/c + 1 consecutive runs. This should give us a 
term A^d^ with the error being bounded by the probability of seeing (fc + 1) deletions in 2fc + 1 consecutive 
runs. This error should decay exponentially in k for d < do, assuming our hypothesis on correlation decay. 
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• What is the next order correction to the optimal input distribution? It appears that this correction should 
be of order (P and should involve non-trivial dependence between the run length distribution of consecutive 
runs. It would be illuminating to shed light on the type of dependence that would be most beneficial in terms 
of maximizing rate /(X) achieved. Moreover, it appears that computing this correction heuristically may, in 
fact, be tractable, using some of the estimates derived in this work. 

• Can the results here be generalized to other channel models of insertions/deletions? 

• What about the deletion channel in the large deletion probability regime, i.e., d ^ 17 What is the best coding 
scheme in this limit? It seems this limit may be harder to analyze than the c? — )■ limit studied in the present 
work: For d = I the channel capacity is and there is no particular coding scheme that we can hope to modify 
slightly in order to achieve good performance for d close to 1. This is in contrast to the case ci = 0, where we 
know that the i.i.d. Bernoulli(l/2) input achieves capacity. 

• Can a similar series expansion approach be used to 'solve' other hard channels in particular asymptotic regimes 
of interest? 

• We did not compute explicitly the constants in the error terms of our upper and lower bounds, thus preventing 
us from numerically evaluating our upper and lower bounds on capacity (cf. Remark II. 2p . It would be 
interesting to compute constants for the error terms leading to improved numerical bounds on capacity. 
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A Proofs of Preliminary results 

Proof of Theorem \2.1[ This is just a reformulation of Theorem 1 in [3], to which we add the remark C = inf„>i C„, 
which is of independent interest. In order to prove this fact, consider the channel Wm+m and let X™^^ = 
be its input. The channel W„+„ can be realized as follows. First the input is passed through a 
channel Wm+n that introduces deletions independently in the two strings X™ and and outputs = 

I, where | is a marker. Then the marker is removed. 

This construction proves that Wm+n is physically degraded with respect to Wm+n, whence 

{m + n)Cm+n < max 

< mCm + nCn ■ 

Here the last inequality follows from the fact that Wm+n is the product of two independent channels, and hence the 
mutual information is maximized by a product input distribution. 

Therefore the sequence {nC„}„>i is superadditive, and the claim follows from Fekete's lemma. □ 

Proof of Lemma\2M Take any stationary X, and let /„ = Notice that Y{X1) - X^ - X^XT " 

YiX'XXT) form a Markov chain. Define as in the proof of Theorem [O We therefore have /„+,„ < 

< + liX'XXx'^ "^{^ZXD) = I'm + In- (the last identity follows by stationarity of 

X). Thus Im+n l£ In -'r Im and the limit lim„_5.oo In/n exists by Fekete's lemma, and is equal to inf„>i 
Clearly, /„ < C„ for all n. Fix any e > 0. We will construct a process X such that 

lN/N>C~e yN>No{e), (49) 

thus proving our claim. 

Fix n such that C„ > C — e/2. Construct X with i.i.d. blocks of length n with common distribution p*{n) that 
achieves the supremum in the definition of Cn ■ In order to make this process stationary, we make the first complete 
block to the right of the position start at position s uniformly random in {1, 2, . . . , n}. We call the position s the 
offset. The resulting process is clearly stationary and ergodic. 
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Now consider N = kn + r ioi some /c G N and r g {0, 1, . . . , n — 1}. The vector contains at least k — 1 
complete blocks of size n, call them x{l),x{2), . . . ,x{k — 1) with x{i) ^ P*{n)- The block x{l) starts at position 
s. There wiU be further r + n-s + 1 bits at the end, so that = {X^^^,x{l), x{2), ...,x{k- 1), We 

write y{i) for Yix{i)). Given the output Y, we define Y = {YiX',-^)\y{l)\y{2) | . . . !y(fc - 1) I >"(^i^(fc_i)„)), by 

introducing k synchronization symbols [ . There are at most {n + 1)'^ possibilities for Y given Y (corresponding to 
potential placements of synchronization symbols). Therefore we have 

H{Y) ^ H{Y) - H{Y\Y) 

>H{Y)~\ogiin + lf) 

> (fc-l)7J(y(l))-fclog(n + l), 

where we used the fact that the (a;(i), y(i))'s are i.i.d.. Further 

H{Y\X^) < <{k~ l)H{y{l)\x{l)) + 2n , 

where the last term accounts for bits outside the blocks. We conclude that 

I{X^;Y{X^)) = H{Y) - H{Y\X^) 

>{k- l)nCn - k log(ri + 1) - 2n 
> N{Cn - e/2) 

provided log(n + l)/n < e/10 and N > Nq = lOn/e. Since C„ ^ C — £/2, this in turn implies Eq. (|49p . D 

B Proofs of Lemmas in Section 15.11 

Proof of Lemma \5.3[ Combining (|10p , Lemma 15.11 and (1141) it follows that for small enough d, we must have 

DipL\\pl)<3d^ (50) 
to achieve H{X) >l-d^. Now define A = J^Zio ^Pl{1)- Take a = e^'^. We have 

a' (1 - aV 

l=lo 

for sufficiently small d, since a"'" ~ exp {|/31ogc?}. Thus, 



>A-d^ 



lei 



where I — {I : I > Io,pl{1) > a 
This yields, 



^PL(01ogpj^ > Y.lpL{l)\og^> \og{2/a){A~d^) (51) 
It remains to show that the sum of terms from outside I is not too small. By Markov inequality, we have 

lei 

=>^Pl(0 >l-A/lo (52) 
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With a fixed sum constraint on {pl{1),1 ^ I), the smallest value of X]i^iPi(0 log ^^^ly achieved when 

Pl{1) J2lixPL{l) w; fl- T- fr.-i\ 



Note that this ratio is smaller than 1. It follows from ((53|) and ([52]) that for small d, 

^pz.(01og^ > log(^pz.(0) > -2A/?o (54) 
since we know that A < /x(X) = 3, and hence A/Zq < 1/10. The lemma follows by combining (|5ip . ((54)) and 

D{pL\\pl)<idP. 

□ 

Proof of Corollary \5.4\ Clearly Li + . . . + > fcZ* occurs only if at least one of the L^'s is at least Also, the 
distribution PL(fe) has a marginal pL for each individual Li. We have 

^ (/i + . . . + Zfc)Pi(fe)(/i, . . . 

/i+...+/fc>fc;. 
fc 

<^ ^ is the largest] fc;jPi(fc)(/i,...,Zfc) 

i=i /i+...+;fc>fe/, 

k oo 



1 = 1, 

The result now follows from the first inequality in Lemma 15.31 □ 
Proof of Lemma 15.51 Repeat proof of Lemma 15.21 □ 



Proof of Proposition \5.(A A time shift by a constant in Y corresponds to a time shift by a random amount in X. 
The random shift in X depends only on the D and is hence independent of X. Also, D is independent identically 
distributed. Thus, stationarity of X implies stationarity of Y. □ 

Proof of Lemma \5?l\ Consider a run R of length / > 21q in X. With probability at least (1 — d)^, the runs bordering 
R do not disappear due to deletions. Independently, with probability P [Binomial ( 1 — d) > 1/2] at least half the 
bits of R survive deletion. Thus, for small d, with probability at least 1/2, R leads to a run of length at least 1/2 in 
Y. Moreover, runs can only disappear in going from X to Y. It follows that 



l=lo 1=21 

From Lemma 15.31 applied to Y, we know that 



l=lo 

The result follows. □ 
Proof of Corollary \5.8[ Analogous to proof of Corollarv l5.4l □ 
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Proof of Lemma \5.9[ We adopt two conventions. First, when we use the O(-) or the notation, the constant 
involved does not depend on the particular X, Y under consideration. Second, we use 'typical' in this proof to refer 
to events having a probability for some S > 0. Thus, an event with probability 2(P is not typical, but an 

event with probability d^'^ is typical. 

We ignore boundary effects due to runs at the beginning and end. 

First, we estimate the factor due to disappearance of runs in moving from X in Y. Define 

Number of runs in Y(X") 

r'(X) = lim ; 

n^oo Number of runs in X" 

We have almost sure convergence of this ratio to a constant value due to ergodicity. 

Runs disappear typically due to runs of length 1 being deleted, and the runs at each end being fused with each 
other (i.e. neither of them is deleted). Such an event reduces the number of runs by 2. Non- typical run deletions lead 
to a correction factor that is 0{(P). Hence, the expected number of runs in Y per run in X" is 1 — 2pi(l)o? + 0{(P). 
It follows from a limiting argument that 

r ^l-2pL{l)d + 0{d^) (55) 
In this proof, we make use of the following implication of Lemma 15.51 

Pi(fe)(Zi,...,/fe)-2-^-i'-| < K'Vkd^/^ (56) 

We immediately have pi(l) = 1/2 + 0{d'^/'^) and hence r = 1 - d + 0{d^+l^/'^). 

Consider qL{^)- Blocks of length 1 in y typically arise due to blocks in X of length 1 or 2. In case of a block 
of length 1, we require that it isn't deleted, and also that bordering blocks are not deleted. Consider a randomly 
selected run in X (Formally, we pick a run uniformly at random in X" and then take the limit n oo). The run 
has length L = 1 with probability Pl(1)- Define 

• El EEE No bordering block of length 1. We have P[Ei, i = 1] = (1/8) + 0(d'3/2). 

• E2 = One bordering block of length 1. We have P[E2,i = 1] = (1/4) + 0{d'^/'^). 

• E3 = Two bordering blocks of length 1. We have P[E3, L = I] = (1/8) + 0{d^/'^). 

Probabilities were estimated using pl{1) = 1/2 + 0{d'^/'^), Pl(2){1, 1) = 1/4 + 0{dP/'^) and pl(3)(1, 1, 1) = 1/8 + 
0(d'^/2), and then immediate consequences Pl(3)(1, 1, > 1) = 1/8 + 0{dl^/^), Pl{3)(.> 1, 1, 1) = 1/8 + 0{dl^/^) and 
PL{3)i> 1, 1, > 1) = 1/8 + 0(d'3/2)_ We made of Eq. 
Probability of arising from block of length 1 is 

(1 - d) {P[Ei,L = 1](1 - 0(^2)) +P[E2,i = 1](1 - d){l ~ 0{d^)) + P[E3,i = 1](1 - d)2} 
= PL(l)(l-2d) + 0(di+^/2) 

Probability of arising from a block of length 2 is pi(2)2(i + O(d^) = d/2 + 0{d^+l^/'^), using Eq. It follows that 



(1) ^ vdm-2d)+di2+oid^-^/^) ^ ^^^^^ ^ 



9iUJ = 
as required. 

Now consider (7l(0 for 1 < Z < K\og{l/d). Typical modes of creation of such a run in Y are: 

1. Run of length / in X that goes through unchanged. 

2. Two runs in X being fused due to the length 1 run between them being deleted. Fused runs have no deletions. 
They have / bits in total. 

3. Run of length / + 1 in X that suffers exactly one deletion. Bordering runs do not disappear. 
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For mode 1, we define events Ei, E2, E3 as above. Probability estimates are: 

• F[Ei,L = l] =2-'-2 + 0(^/5/2), 

• F[E2,L = l] = 2-'-i +0(d'3/2). 

• P[E3,L = ?] =2-'-2_^0(d'3/2). 

using Eq. ()56p as we did for L — 1. Thus, probability of creation from randomly selected run via mode 1 is 

(1 - dy {P[Ei, L = l]il- Oid^)) + P[E2, L^I]{1- d){l - 0{d^)) + P[E3, L = l]{l~ df] 

^PL{l)-2-\l + l)d + 0{d^+f"^-') 

for any e > 0, since I < K\og{l/d). 

The probability of a random set of three consecutive runs being such that the middle run has length 1 and 
bordering runs have total length I is {I - l)2-'-i + 0(d^/2-c) 

using Eq. (1551) and / < Klog(l/d) < d for small 
enough d. Probability of the middle run being deleted and the other two runs being left intact, along with bordering 
runs of this set of three runs not being deleted, is d + O(ld^). Thus, probability of creation via mode 2 is (/ — 
l)2-'-i(i + 0(di+'5/2-^). 

It is easy to check that the probability of mode 3 working on a randomly selected run is (^ + 1) 2^^^^d+0{d^^'^^^). 
Combining, we have 

qUl) = r-^ {pLil) - 2-'{l + l)d l)2-'-^d +(/ + !) 2-'-^d + 0{d'+^^^-')} 

^p^{l) + 0{d'+PI'-^) 



This completes the proof of (i). 
For (ii), simply note that 



m(y) 



r(X) X lim 



Length of ^ l-d 



It follows from Eq. (|55|) that 



|/i(X) - m(Y)| < 4|pi(l) - l/2|d + ^3^2 
for some K3 < cxi. Eq. (fT8|) follows using Lemma [5^ to bound 



(57) 
□ 



Proof of Lemma \5.10[ Similar to proof of Lemma [5.9f i). We use Eq. ([5S)) again, and make use of fc < — 
Klog(l/d) to deduce that \/k + 2 < rf^^/^ for small enough d. □ 



Proof of Lemma \5.11\ From Lemma I5.9( ii) , we know that 

00 00 



1=1 



1=1 



< Kid 



l+P/2 



Recall I = [41og(l/(i)J. Using Lemma ISTOT i) . we deduce 



1=1 



1=1 



(58) 



(59) 



From Lemma 15.31 we know that 



1=1 



Note that ki, K2, ^3 do not depend on /3. 

Combining Eqs. (|58|) . (|59p and (|60| . and using fS < 2, we arrive at the desired result. 



(60) 



□ 
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Proof of Lemma \5.12\ By Lemma 15.51 applied to Y, we know that 

CXD oo OO 

X] 5Z ■ • • X! \lL{k){h,l2,-- .,lk)-pl(^k)ih,-- ■,lk) < KzVkcP^^ . 
Using Lemma rS.lOi we have for d < dgiK, 7), for any integer k and {li, . . . , 1^) such that X^iLi h < Klog(l/d). 

\PL{k){h,l2, ■ ■ ■ Jk) — qL{k){h,h, • ■ • , < Kg d . 

Thus, we obtain Eq. p^ . using k < K\og{l/d) < d^^ for small d. Eq. follows. Also, note that we can deduce 

\pL{l)^pl{l)\<2K5d^/' (61) 

for small enough d. We repeat the proof of Lemma [5.9r i) (or Lemma [S.lOp . using Eq. (IT9t instead of Eq. (|56)) to 
obtain Eq. ([20]). This completes the proof of (i). 

For (ii), we proceed as follows to prove Eqs. ([2T|) and (|22|) . In the proof of Lemma [5.9f ii). we deduced that 
|^(X)-/^(Y)| < 4|pl(1)- l/2|d + Kyd^ (this is Eq. ((57)) with the constant renamed). Using Eq. ((BT|) to bound vr il). 
we obtain Eq. From Lemma O applied to H{Y), we know that |^(Y) - 2| < 7d'^/'^. Eq. ^ follows. □ 

Proof of Lemma \5.13\ Associate each run in Y with the run in X from which its first bit came. Consider any run 
Rp in X. If it gives rise to a run in Y of length A[l/(iJ , then we know that the runs i?p+i, Rp+3, . . . , i?f 4_2[a-o.iJ-i 
were all deleted (since X S iS^i/dj). This occurs with probability at most d^^~^'^K Further, for each run in X, there 
are Ai(X)(l - d)/fi{Y). This implies 

From Lemmas 15. II and 15. 9f ii). we know that |/i(X) — 2| < 0.1 and |/i(Y) — 2| < 0.1 for small enough d. Plugging into 
the above equation yields the desired result. □ 

Proof of Lemma \5.14\ We make use of Eq. (P5)) . Maximizing H{T) for fixed Jl, it is not hard to deduce that 

^ < /(Ai) (62) 
M 

2 / 2\ 
where /(x) = 1 I log(x — 2) + log a; 



with equality iff X consists of i.i.d. super-runs with pf{l'"''^, I — r°P) = (A — 1)^A ' where A = /i/(/i — 2). Now, using 
Eq. (123D, H{X) < H{f)/Jl, and Eq. we know that we must have /(/I) > 1 - d"'^. Now, we have /(4) = 1. 

Further, it is easy to check that /(•) achieves its unique global and local maximum at 4, increasing monotonically 
before that and decreasing monotonically after that. It follows that for any fixed e > 0, for small enough d, we 
must have |/I — 4| < e. It then follows from Taylor's theorem that /(/I) < 1 — (/i — 4)^/15, so that we must have 
\Jl-4\< 4(^3/2 for d < do, where do > 0. □ 

Proof of Lemma \5.15\ An explicit calculation yields 

H{f)^]l{X)-D{pf\\p*^) 

The proof now mirrors the proof of Lemma |5.3[ making use of Lemma 15.141 in place of Lemma |5. II □ 

Proof of Lemma \5.16i It is easy to see that fx = ^Pz(0//^(^) is the asymptotic fraction of bits in X that are 

part of super- runs of length at least £. Similarly, /yX^z^^ the asymptotic fraction of bits in X that 

are part of super-runs of length at least £. 
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We argue that /y > 0.9/x- Consider any bit bp at position P in X that is part of a super-run Si with length 
Li > £. Consider a contiguous substring of Si that includes bp of length exactly £. Clearly such a substring exists. 
The probability that it does not undergo any deletion is at least 1 — £d < 0.9 for small enough d. Further, if this 
substring does not undergo any deletion, then all bits in this substring are part of the same super-run in Y, which 
must therefore have length at least i. It follows that bit bp is part of a super-run of length at least £ in Y with 
probability at least 0.9. Thus, we have proved /y > 0.9/x. From Lemma [5. 141 it follows that pt(X) < 5 and pt(Y) > 3 
for small enough d. Putting these facts together leads to the result. 

oo oo 

E IPzil) < 5/x < 5/Y/0.9 < ^g-g J2 '9Z(0 < 80d^ , 
1=1 ■ 1=1 

where we have made use of Lemma 15.151 applied to Y. □ 
Proof of Corollary \5.17\ Analogous to proof of Corollary 15.41 □ 



C Proof of Lemma 15.181 

The proof of Lemma 15.181 is quite intricate and requires us to define a new modified deletion process in terms of 
super-runs. 

Now we define a new modification to the deletion process, we call it the perturbed deletion process to avoid 
confusion with the modified deletion process D. 

The input process X is divided into super-runs as ... , 5*0, ^i, . . . (cf. Definition 231) • For all integers i, define: 

Z* = Binary process that is zero throughout except if (5;, 5^+1, 5*^+2)) have three or more deletions in total, in 
which case = 1 if and only if Xi e Si and Di = 1. 

Define 

00 

Z= ^ Z* 

2 — — 00 

where ^ here denotes bitwise OR. Finally, define ]D)(1D),X) = BffiZ (where ® is componentwise sum modulo 2). The 
output of the channel is simply defined by deleting from X" those bits whose positions correspond to Is in B. We 
define K for the modified deletion process similarly to K. 
We make use of the following fact: 

Proposition C.l. Consider any integer m > 0. Let Ui,U2, . ■ ■ ,Um be random variables, taking values in N, 
that have the same marginal distribution, i.e., Ui ^ U for i ^ l,2,...,m, and arbitrary joint distribution. Let 
fi, f2T ■ ■ T frn : N — > M+ be non- decreasing functions. Then we have 

- m -1 r ?n 

E llf^m <E l[f^{U) 

-i=l -I '-1=1 

Proof of Provosition \C.l\ We prove the result for m — 2. The proof can easily be extended to arbitrary m G N. 

We want to show that for random variables U and V , with U ^ V , and non-decreasing, non-negative valued 
functions /, 5, we have 

nf{u)9{v)]^nf{u)9m 

Part I: 

Define H = {f : E[f{U)l{V > b)] < E[f{U)l{U > b)], V& G R}. 
Claim: The class H contains all non-negative, non-decreasing functions /. 
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Proof of Claim: 

(i) We have I[a,oo) G 'H,Va e R. 

E[1{U > a)I{V > b)] < mm{¥{U > b),F{U > a)} = F{U > max(a,&)) = E[I{U > a)I{U > b)] 

(ii) If A, /2 e H then ci/i + C2/2 e H for any ci > 0, C2 > 0. 
This follows from linearity of expectation. 

Define the class of 'simple increasing functions' 

fc 

I = {f : 3k e N s.t. / = ^ CiI[(j.,oo) for some q > 0, a.^ e M for i = 1, 2, . . . , fc} 

i=l 

(iii) It follows from (i) and (ii) that I CH. 

Now, it is not hard to see that for any non-negative non-decreasing /, we can find a monotone non-decreasing 
sequence of functions {,fn)^=i G ^ such that /„ f /• By the monotone convergence theorem, we have 

lim E[UU)1{V > b)] = E[f{U)I{V > b)] , 

n— ^-oo 

lim E[fn{U)I{U > b)] = E[f{U)l{U > b)] . 
Combining with (iii), we infer that f E H, proving our claim. 
Part II: 

Define iij - {g : E[fiU)g{V)] < E[f{U)g{U)]}. 

From Part I, we infer that I{V > b) E Hf for all & € M. We now repeat the steps in the proof of the Claim in 
Part I, to obtain the result "The class Hj contains all non-negative, non-decreasing functions 5." This completes our 
proof of the proposition. 

□ 

Lemma C.2. There exists do > such that for any d < do the following occurs: Consider any X G S^i/d] ■ Then 

1 - - ^2 

lim -H{K{X'')\X",Y{X")) 



n->oo n /i(X 

CO OG 



PL(k+2) (1, 1, . ■ . (fc + 1 ones), Ik+l) (fc - 1 + h+l) ^ ( L _ 1 I ; — 

00 00 00 

^ PL(fe+2)(^o, 1,1,- --(fc ones), /fc+i) (^o + fc- l + 



l() — 2 k — 2 



^0 + 1 



^0 + fc - 1 + h+i 



f 5 (63) 

for some 5 such that \5\ < 18d^E[L'^]. 

Proof of Lemma \C.2[ Using the chain rule, we obtain 

M 

H{k{X-)\X-,Y{Xn) = 5]i/(|l(j)| \X{3)...X{M).Y{j)-Y{M)) 

i=i 

Consider the term tj = H{\X{j)\\X{j)...X{M),Y{j)...Y{M)). Suppose the first bit in X{j) ... is part of super- 
run Si. Call the first run in X{j) be Rp. By the construction of the perturbed deletion process, we know that 
Si, Si+i and Si+2 cannot have more than two deletions in total. 

Different cases may arise: 
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If Lp + ip+2 > 1^0)1 then we know that X{j) ~ {Rp,Rp+i,Rp+2)- If not, then we know that X{j) = 
{Rp, Rp+i, Rp+2, Rp+3, Rp+i)- In either case, tj =0. 

. Lp > \Y{j)\ 

It must be that X{j) = Rp. Again, tj = 

•Lp = \Y{j)\^ 

In this case, if Lp+i > 1 or > 1, then we know that X{j) — Rp and tj = 0. Suppose £p+i = ip-|_2 = 1- 

Now consider the possibihty that X{j) — {Rp, Rp+i, Rp+2) (this is the only alternative to X{j) = Rp). For 
this possibility to exist, the following condition must hold 

C EE + l)Y{j + 2)... must match exactly RpRp+^Rp+i . . . 

until the end of Si+2} H {ip+i = ip+2 = 1} 
(Else, we would need more than two deletions in {Si, S'i+i, >5'.i-|-2), a contradiction.) 
Note that in any case, there are at most two possibilities for X{j), so we have tj < 1. 

Let us understand C better. Let Si include k runs to the right of Rp, i.e., Lp+i = Lp+2 = . . . = ip+fe = 1 and 
Lp+k+i > 1- Condition C can arise, along with X{j) starting at Rp iff: 

• Runs Rp-i does not disappear under D. 

• Super-runs (5^,5^+1,5^+2) undergo no more than two deletions in total. Event E. 

• One of the following deletion patterns occur: 

— (Only if Lp > 1) The bit Rp+i is deleted and one deletion in Rp. Event Ei. 

— The bits Rp+i and Rp+2 are deleted. Event E2. 

— The bits Rp+2 and Rp+3 are deleted. Event E3. 

— The bits Rp^k-i and Rp+k are deleted. Event E^. 

— The bit Rp+k is deleted and one deletion in Rp+k+i- Event E^+i. 

Define po = (l-d)i.+ii+i+ii+2-2^ It is easy to see that P(EinE) ^pQd'^Lp, P(E/nE) = pod'^ for Z = 2,3, . . . 
and F{Ek n E) = p^d^ Lp+k+i- We know that exactly one of these has occurred. (Ei n E) U (E2 n E) leads to 
X{i) = {Rp, Rp+i, Rp+2), whereas all other possibilities lead to X{j) = Rp. It follows that if C holds, Lp = lp and 
Lp+k+i = Ip+k+i, 



tj = h 



lpl{lp > 1) + 1 



Jpl{lp > l) + k-l + lp+k+i. 

Let Rp be a uniformly random run (cf. Section[5|). The probability of seeing Lp — lp, k, Lp+k+i = Ip+k+i and 
(El U E2 U . . . U Efc) n E is 

PL{k+2){h, 1, 1, . . . (fc ones),Zp+fe+i) pod"^ {lpl{lp > 1) + fc - 1 + Ip+k+i) 

where po = {I — ^^Li+Li^i+Li^2-2 j-j- jg gg^gy ggg ^j^g^^ p^ ^ {1 — d{Li + Li+i + £^+2), !)• Also, the conditional 
probability of Rp-i not disappearing is in (1 — d, 1). Thus the expected contribution of Rp to the sum is 

^ 00 00 00 

^^j $Z X! X! PL(fc+2)(^P, 1,1, ••■ (fc ones), /p+fc+i) (?pl(?p > 1) + fc - 1 + /p+fc+i) 

lp=2k=2lp+k + l=2 



■ h 



lpl{lp > 1) + 1 



Jpl{lp > 1) + k - 1 + lp+k+1 , 
where \S\ < 2d^E[{Li + Li+i + Li+2)'^] < 1M^E[L'^], using Fact[CT]in the final inequality. The resuh follows. □ 
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Corollary C.3. For any e > 0, there exists rfg = do{e) > 0, and k < oo such that for any d < do the following 
occurs: Consider any X g S\^ifd\ such that H{1C) > 1 — d^"*^ and max{iJ(X), iJ(Y)} > 1 — d'^ for some 7 G (1/2, 2). 
Then 

1 ^ ^ 
lim -i/(ii:(X")|X",y(X")) = — 

00 CXD 



CXD CXD / 1 \ 



A;— 2 //s-j-i— 2 

CXD 00 CXD 



Iq—2 k—2 /i. 



1 



^0 + fc - 1 + h+i 



f ?7 (64) 
/or some rj such that \ri\ < K(f}^'^ . 

Proof of Corollaru \C.3[ We prove the corollary assuming H{Y) > 1 — d'' . The proof assuming _ff(X) > 1 — d''' is 
analogous. 

Consider the second summation in Eq. (|63|) . Define £ = [41og(l/(i)J . Consider any term with la < i, k < £, 
Ik+i < i. Using Lemma [5?T2l fi) (Eq. we have 

\pLik+2) {lo, 1, 1, . . . (fc ones), - 2-('"+'=+''=+i) | < 

for d < do{e). Note that do does not depend on ^o, k, It+i- It follows that 

X] X] X! PL(k+2){ioAA, ■ ■ -{k ones),lk+i) {Iq + k - 1 + Ik+i) h\ " 



An + k — 1 + 



CXD 00 CXD 



/n=2 /c = 2 



^0 + 1 



/o + fc - 1 + 



where |(52i| < d'^/2-£/2^ 

We make use of Lemma 15.161 to bound the error due to the missed terms. Let Iq be the length of the super-run 
containing the initial run of length Iq. Clearly, Iq > Iq + k. Let li be the length of the next super-run to the right. 
Clearly, li > Ik+i- Now 

{^0 > £} OR {k > £} OR {Ik+i > £} 
=>{lo + k + Ik+i > 1} 
=^{lo + h>£} 

Also, (lo + k — 1 + Ik+i) < Iq + h and h{p) < 1 for any p. It follows that the missed terms contribute 

S22 < E ^'l(2)(^0 Jl) (^0 + ^l) < d^/'-^/' 
/o+/i>« 

to the sum, where we have used Lemma l5.16l in the second inequality. 
Thus, we have established 



00 CXD 00 



EE E PL{k+2){loAA,---{k ones),lk+i) {lo + k-l + lk+i)h 
2 

00 00 00 

= EE E 2-('°+'=+''=+^) (?o + - 1 + /fc+i) 



Iq^2 k^2 lk + 1^2 

00 CXD 00 



lo + l 



lo=2 k=2 U 



Iq + k-l + h+i 



lQ + k- \+ Ik+i 
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with \S2\ < 2d''/2-«/2 for d < do(e)- The first summation in Eq. (gSl) can be similarly handled. Finally, Lemma 
[5J2i: ii) tehs us that \fi{X) - 2| < d''^^ for small enough d. Putting the estimates together yields the result. □ 

Proof of Lemma \5.18\ We prove the lemma assuming H{Y) > 1 — d'^ . The proof assuming i/(X) > 1 — c?"' is 
analogous. 

It is easy to verify that the right hand side of Eq. (|64|) is, in fact, d^C4 + -q. We show that 

lim i|i/(if(X")|X",y(X"))-i/(if(X")|X",y(X"))| <(ii+'^-"/2 (65) 

whence Eq. (|26)) follows using Corollarv lC.31 

Consider Z" defined in our construction of the perturbed deletion process. We define U{X^, Z?", Z") e {t, 0, l}'^' 
constructed as follows: Start from the first bit in Y and consider bits sequentially 

• For each bit also present mY ,U has a t. 

• For each bit not present in F, [/ has if that bit and a 1 if that bit is 1. 
Clearly, the corresponding stationary process U can also be defined. 

Recah Fact [OS It is not hard to see that ^ (X",?) and {X^'^Y.K) < ^^'^^ > (X",?,^). It follows 

that 

|iy(^(X")|X",Y-(X")) < 2H{U) + H{Z) 

Let z = V[Zj = 1] for arbitrary j. The number of deletions reversed in a random super-run is at most 
^^Sio h i2 PZ(3)(^0' 'i' ^2)(^o + ^1 + ^2)'^ in expectation (similar to Eq. ([28])). Using Proposition [C]ll this is bounded 
above by 27d^E,[L^]. Since each super-run has length at least one, it follows that z < 27d^E[L^]. Using Lemma 
EUSland L < 1/d w.p. 1, we find that E[L^] < d^-^ for smah enough d. Hence, z < 27d'^+'''. It follows that 
H{Z) < h{z) < d^+'^-'-l'^ for smaU enough d. 

Let u = f{Uj ^ t) for arbitrary j. Then u = z/(l - d). It follows that H{1]) <u + h{u) < d^+'^'^Z'^ for smah 
enough d. Finally, we have 

lim ^H{U)+H{Z) ^ _ ^)^(u) + H(Z) < 3d'+^-^/^ 

n— >oo 77, 

leading to the desired bound Eq. (|65l) . □ 



D Proof of Lemma 15.231 and its corollaries 

Proof of Lemma \5.23[ We make use of and the fact that X is stationary and ergodic. Consider a randomly 
chosen run Rp in X. We associate H{D{j)\X{j),Y{j)) with Rp if Rp is the first run in X{j)- Denote by £p+i the 
length of Rp+i for any integer i. We add contributions from the three possibilities of how Y{j) arose under D{j): 

1. Prom a single parent run 
Define 

Bi = Rp suffers one or two deletions under D and 3j s.t. X(j) = Rp 

Clearly, Bi is exactly the event we are interested in here. We will restrict attention to a subset of Bi and the 
prove that we are missing a very small contribution. Define 

El = BiC] {Rp-i and Rp+i do not disappear under B.} 

Consider Bi\Ei. For this event, one of the following must occur: 
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• Run i?p_i disappears under D but not under D. For this, we need at least 3 deletions in run Rp-i- A 
simple calculation shows that this occurs with probability less than dPV^_i. 

• Run Rp-i disappears under D as well. In this case Rp-2 also disappears under D. Thus, we need Rp-i 
and Rp-2 both to disappear under D which occurs with probability at most dP . Moreover, we require at 
least one deletion in Rp (probability less than Lpd). Thus, the overall probability is bounded above by 
d^Lp. 

• Run Rp+i disappears under D but not under D. For this, we need at least 3 deletions in run Rp+i- This 
occurs with probability less than d'^Lf. 

Thus, < P{Bi\Ei) < d^{L%_.^+Lp+L%^^). The largest possible value of H(i5(j)|X(j), y(j)) for a particular 
occurrence of Bi\Ei is maxi=i_2 log (^f) < 2logLp. Thus, the additive error introduced by restricting to Ei 
in our estimate of lim„_^oo ^-ff (-D"|X", Y, K) is 

< 5iE{d,X) < d^E[2{L%_-^ + Lp + Ll^^)logLp] < 6d^E[L^\ogL] (66) 

where we have made use of ProDOsition lC.il 
Partition Ei into two events: 

Bii = El n {Rp undergoes one deletion under D} (67) 
Bi2 ^ El n {Rp undergoes two deletions under D} 



Let Ti be the contribution of Bi, Tn be the contribution of Bn and T12 be the contribution of Bi2. Then we 
have 

Ti = Tn + T12 + 5iE (69) 

• One deletion in Rp: 

Consider Bn. The contribution of a particular occurrence is logLp. Now 

P{Bii,Lp = /, Lp_i = Zp_i, Lp+i = Ip+i) 

= Pi(3)(/-i, /+i) (1 - (1 - rf'-+i) /pd(l - rf)'-i (70) 

We have, for I > \, 

Pi(3)(>l, /, >1) ld{l - dy~^ (1 - 2d^) < P{Bii,Lp = /, Lp_i > 1, Lp+i > 1) 

< PLi3)i>l,l,>l)ld{l-dy-^ 

since probability that Rp-i of length greater than 1 disappears is bounded above by d^ and similarly for 
Rp+i- It follows that 

P(Sii, ip = I, Lp^i > 1, Lp+i > 1) = PL(3)(>1, /, >1) ld{l -{I- l)d) + mAi) 

^2d^PUz){>U.>l)l < Via{1) < dV(3)(>l,/,>l)^(^^ 

Similarly we get 

3ii,Lp^l,Lp^i = l,Lp+i = 1) = Pi(3)(l,/,1)W(1- (? + l)(i) + 771,4(0 

< 771.4(0 < d3p^(3)(i,0i)^(' 
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and 

V{Bn,Lp = Z, ip^i > 1, Lp+i = 1) = Pl(3)(>1, 1 1) ^^(1 - Id) + r?i,3(0 



and 



11, Lp = Z,ip_i = l,Lp+i > 1) = pi(3)(l,/,>l)Zd(l- Zd) + r?i,2(0 

-dV(3)(l,/,>l)^ < ^1,2(0 < dV(3)(l,^>l)/Q) 

Combining, we arrive at the following contribution of Bn to \\mn^oo H{D^^\X^^ /K) /n\ 
Til ^ j^J2^{B^i,Lp ^ l)\ogl 



=2 

oo 



with 



^ {pL(3)(>i,^ >i) nog?(i - - +PL(3)(i,/, 1) nogi{i -{1 + l)d)- 

1=2 

(Pi(3)(i, z, >i) + Pi(3)(>i, z, 1)) nog;(i - w)| + (5ii 



2^3 °° rf3 °o /; I 1 \ 

EpiWnogz<5ii = Jii(d,x)<— ; log/ 



We have normahzed by /i(X) to move from a per run contribution to a per bit contribution. 
It is easy to infer 

-d^¥.\L^\ogL\ < 5ii < d^E[L^\ogL] 

from Eq. ([72l). 

Two deletions in Rp: 

Consider Bi2. If Lp = I > 2 then entropy contribution is log (2). We have, for I > 2, 

P{B2,Lp = /) - dy-'^ ■ F{Rp_i and i?p+i do not disappear under B) 

It follows that 

Pl{1) Q d^i^ - dy < nB2,Lp = l)< pl{1) Q ^'(1 - rf)'"' 



leading to 



KB2,Lp^l)^pUl)Qd' + V2 

-d^PLii)i(!^ <m<o 
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Combining, we arrive at the following contribution to lini„_).oo H{D'^\X^'^ , Y, K)/r 



with 



(74) 



j3 °° 

-d^ElL^logL] < ^ ) log ( ^ ) < = 5r2[d,X) < (75) 



1=3 



Plugging Eqs. ((7T|) and ((71| into Eq. (IMl) . we obtain our desired estimate on the contribution Ti of the event 

^1 = TxT E {pL{3}i>'^J, >i) ^log^i - - 1)'^) +PL(3){'^,i, 1) nogZ(i - (/ + 



1=2 



(PL(3)(1,',>1)+PL(3)(>1,^1)) nog;(l-W)} 



where Si = Jib + Sn + (5i2 is bounded using Eqs. ([66|) . (|73|) and (f75|) as 

-2d3E[L3 log L]<5i < 7d^E[L^ log L] . (76) 

Prom a combination of three parent runs 

Define 

B3 = Rp and Rp+2 suffer at least one deletion in total under D and 
3j s.t. X{]) = {RpRp+iRp+2) 

We are interested in the contribution due to occurrence of event B^. 

Again, we will restrict attention to a subset of -B3 and the prove that we are missing a very small contribution. 
Define 

E'a = i^a n {i?p_i and Rp+3 do not disappear under D.} 
Similar to our analysis for Case 1, we can show that 

< nBzXEi) < d^L%_, +Lp + Lp+2 + L%+i) . 
The largest possible value of H{ID{j)\X{j),Y{j)) for a particular occurrence of -B3\i?3 is 

max log (^^ + Lp+2\ ^ 4iog(Lp + Lp+2) 

4=1,2,3,4 \ t J 

since Rp and Rp+2 can suffer at most 4 deletions in total under D. Thus, the additive error introduced by 
restricting to in our estimate of lim„^oo ■^H{D'^\X'^ , Y, K) is 

< 53E{d, X) < d^E[A{L%_^ + Lp + Lp+2 + L^^+i) log(Lp + Lp+2)] (77) 
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Now, log(ip + Lp+2) < log(2LpLp+2) = 1 + logLp + logLp+2. From Proposition IC.li E[Lp_-^ logip] < 
E[L^logL], also E[LplogLp+2] < E[LlogL], and so on. Plugging into Eq. ((77)) . we arrive at 

< SsEid, X) < d^E[l6L^ + 32L^ log L] (78) 

Now, we further restrict to a subset of E^. Define 

B31 = i?3 n {One deletion in total in Rp,Rp+2} H {Lp+i = 1} 

Consider the event E3\B3i. This can occur due to one of the following: 

• More than one deletion in Rp,Rp^2- This occurs with probability at most (since we also 
need Rp+i to disappear). 

• ip+i > 1: Now the probability that Rp+i disappears is at most . Thus, the probability of P(£'3 n 
{Lp+i > 1}) < {Lp+Lp+2)d^ 

It follows from union bound that P(£'3\i33i) < (P{Lp + Lp+2)^- As before, the largest possible value of 
H{D{j)\X{j)^Y{i)) for a particular occurrence of E^\Bj,i is 41og(Lp + Lp+2)- Thus, the additive error 
introduced by restricting to B^i in estimating the contribution of £"3 is 

< (532 < 4d3(ip + Lp+2)^ log(Lp + Lp+2) 

Now, we use log(Lp + Lp+2) < 1 + logLp + log_Lp+2 and Proposition IC. II to obtain 

< (532 < d^E[16L2 + 32L2 bg L] (79) 

Denoting by T31 the contribution of -B31, and T3 the contribution of B^, we have 

T3 = T31 + 5zE + ^32 (80) 

We consider two cases in estimating T31: 

• Lp > 1 

The value of H{D{i)\X{i), Y{j)) for a particular occurrence is log(Lp + Lp+2)- We have 

P(B3i, Lp = /o, |i?P+2| - h) - d^PLi3){lo, 1, h){lo + h) + m,l 
-d^PL{3}{lo, 1, hWo + hf < V3,l < 

• Lp = l 

The value of H{D{j)\X(j),Y{j)) for a particular occurrence is log Lp+2 since Rp should not disappear. 
We have 

P(B3, Lp = 1, Lp+2 = h) = dV(3)(l, 1, ^2)^2 + ??3,2 
-d^PLi3){lAj2)ll<V3,2<0 

Combining the two cases, we arrive at the following estimate: 

^ 00 

^31 = E ^(^3' = ^0' I^P+2l = ^2) log ( h + > 1) ) 

' 1=3 



d^ 



m(x) 



PL(3)(?0, l,^2)(/o + /2)log(/o + ?2)+^PL(3)(l, 1,^2)^2 log /2 I +(531 (81) 
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where 

In.l 



Pi(3)Go, 1, h) {lo + hf \0g{h + h) < ^31 = SM X) < 



Again, we use \og{Lp + ip+2) < 1 + log'Lp + logLp+2 and Proposition IC. II to obtain 

-d^E[4L'^ + 8L'^ log L] < S31 ^ 8^1 (d, X) < (82) 

Finally, we plug Eq. ([51]) into Eq. to obtain 

13 = ^^1 X! Pi(3)(^0, l,^2)(^0 + /2)log(/o + ^2)+X!Pi(3)(l'l' ^2)^2 log /2 
yio>l,i2 ^2 

where = b-^E + hi+ h\- Using Eqs. ([781), CS]) and (|82|), we obtain 

-d3E[4i2 + log L] < ^3 < ^^£[32^3 + 64^3 log L\ (83) 

3. Prom a combination of five parent runs 

Define 

i?5 = i?p, Rp+2, Rp+4 suffer at least one deletion in total under D and 
3j s.t. X{j) = {RpRp+iRp+2Rp+3Rp+4) 

We have P(i?5) < (P{Lp + Lp+2 + ip+4) since Rp+i and Rp+3 must disappear. Also, the largest possible 
value of H{D{j)\X{j),Y{j)) for a particular occurrence is 

Zip + Lp+2 + -^P+4\ IT I r , r ^ 

max log . < 61og(Lp + Lp_).2 + i^p+4) 

i— 1,2,...,6 \ ^ / 

since each run can suffer at most two deletions under D. Thus, the contribution of ,85 is (^5, where 

< (55 <6d^E[(Lp + Lp+2 + Lp+4)log(ip + Lp+2 + Lp+4)] < rf^E[36L + 54L logL] (84) 

where we have used log(Lp + Lp+2 + ip+4) < 2 + logLp + logLp+2 + logLp+4 and Proposition lC.il 

4. From a combination of 2fc + 1 parent runs for A: > 3 
Define 



B2fc+i = 3j s.t. X{j) ^ (RpRp+i . . . i?PH 



2k) 



We need k runs to disappear, and this occurs with probability at most rf*"'. The largest possible value of 
H{D{j)\X{j),Y{j)) for a particular occurrence is 2(fc+l) log(Lp + ip+2 + - • .+ip+2fe) < 2(A;+1) log((fc + l)/d) 
since no run has length exceeding 1/d. Thus, the contribution of B2k+i is bounded above by d''2(fc + 1) log((A; + 
l)/d). Summing we find that the overall contribution Tgt5 of Bj, Bg, . . . is bounded as 

00 

< Tgt5 <J2d'"^{k + l) log{{k + l)/d)< lOd^ log(l/rf) (85) 

fc=3 

for small enough d. 
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Finally, we obtain 



lim -H{D'-\X^, y, K)=Ti+T:, + T^ + Tgts 



d 



OO 

■ J2 {pl(3)(>i,', >i) nog?(i - G - + (Pi(3)(i,^ >i) +pl(3)(>i,^, 1)) nogZ(i - Id) 



1=2 



-Pi(3)(i,M)nog/(i-(; + i)d)} 



+ ^7(X) I 51 PL(3)(^0, l,/2) Go + ^2) logGo + ^2) + ^ PL(3)(1, 1,^2)^2 log ^2 



Klo>i,h 1,1, '2 



where (5 = + (^3 + (Js + Tgts . Rearranging gives Eq. ([50]) , whereas Eq. (PT|) follows for small enough d from Eqs. ([7S|) , 
([M)) and ([55]) and the fact that no run has length exceeding 1 /d. □ 



Proof of Corollary \5.24\ We prove the corollary assuming H{Y) > 1 — d''. The proof assuming H{X) > 1 — rf'' is 
analogous. 

It follows from Fact [5^ that if H{Y) > 1 - then 6 (cf. Eq. ^) is bounded as \S\ < Kid'^+'^ log{l / d) < 
^i+j-e/2 gjjjg^jj enough d, for some ki < cx). 

Consider X]/^2 Pi(0^^ log ^- We separately analyze the first Iq — [41og(l/(i)J terms of the sum. We use Lemma 
\UJT[ i) (Eq. (Ull)) to deduce that 

Y,Pl{1)1^ log I = Y^PlW log ' + a , (86) 

(=2 (=2 
with 1^1 1 < K4d^/'~'/^(?o)^ < Ksrf^/'"'/' , 

for small enough d. Next, we use Lemma 15.71 to deduce that 

00 Ll/<iJ 

J2 PL{l)l^\ogl^ PL{l)l^'^ogl < Ked^l/d)\og{l/d) < Krd^-'/^ (87) 

l=lo + l l=l(, + l 

for small enough d. Finally, Lemma [5.12( 11) tells us that 

lAi(X) - 2| < Ksd'^^^ 
Combining with Eqs. ([55)) and (1571) . it follows that 



j2 00 w2 I °° 

J2puiriogi^^\j2pi{iriogi 



^W..2 



V2 



where |?72| < ngd'^^'^/^ '/^ < n^d^^'^ "^Z^, for small enough d. 

Other terms in Eq. (pO|) can be similarly analyzed. The result follows. □ 



Proof of Corollarv \5.26[ We prove the corollary assuming H{Y) > 1 — d''. The proof assuming i/(X) > 1 — d'*' is 
analogous. 

By definition, is independent of X", so H{D^) = i7(_D"|X") ~ nh(d), where h{-) is the binary entropy 
function. We have, for Y = r(X"), 

H{Y,K\X'') = H{D''\X'') - H{D''\X''\Y,K) 
= nh{d) - H{D''\T\ r, k) + nSi 
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with |(5i(d,X)| < 2iJ(Z")/7i 2h{z). It follows from Corollary[5^ with 7 = 2- e/2, that 

1 rl °° 

hm -H{Y{X^^),K{X'^)\X") = h{d)--—y2pL{l)nogl-d^C3 + 62 (88) 

with |(52| < 2/i(z) + Kid^^'-. From Proposition 15.221 we know that z < Kid'^^^Z^. It follows that h{z) < K2d^~'^ and 
hence 1^21 < Ksd'^"'^. Simple calculus gives 

h{d) =dlog(l/d) + (d-dV2)/ln2 + J3 (89) 

l^sl < K^d^. Using Lemma [5TT2f i) (Eq. ^) and Lemma [5Jl we obtain 

oo e. 

^PL(z)/iogZ = ^<zi(0nog/ + 54 (90) 

;=2 1=2 

where |^4| < Kc,d^~'^ for small enough d. Using Lemma [5 . 12( ii) fEq . ([22])) and /i(X) > 1 (from Lemma [Q]) . we obtain 

1 1 ^ ,2-. 



< Ked^"' (91) 



^Ji{X) A^(Y) 

Also, it follows from |/i(Y) — 2| < Tc?^"*^/^ fLemma 15.11 applied to Y) and elementary calculus that 

MY)}-i = i-iMY) + 55 



1 ^ 

= 1-jY.1lW + 5^ (92) 



where l^el < Kyd^"'^. Here we have used Lemma [Ol (applied to Y) to bound Y^^i+i <1l{1)1- 
Plugging Eqs. dHU), ^ and ^ into Eq. ([HHl), we obtain the resuh. 

□ 
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